Rule Post Pruning is a method to reduce the depth of a decision tree.

The depth of a decision tree affects its complexity. The deeper, the more complex. By reducing the depth of the tree, you are reducing its complexity, thus reducing over fitting.

depth <==> complexity <==> overfitting

Here is how you do it:

  1. For each path going from the root of your tree to a leaf, create a rule (a conjunction of attribute values).
  2. Keep pruning each rule, as long as performance on test data isn’t negatively affected.
  3. Order the rules from most accurate (on test data) to least accurate.
    1. When classifying, try to use the most accurate rule that provides a classification. For example, if you try rule1 (the most accurate), and it doesn’t provide a classification for you, then try rule2 (the second most accurate). You get the idea!!! :angry:

Let’s break down step 2:

  • For each rule
    • For each attribute
      • Remove the attribute
      • See how you perform on the test data
        • If you perform worse, add the attribute back