Candidate Elimination is a machine learning algorithm that finds one or more boolean valued functions that fit your training data.

It is different from the Find-S algorithm because it finds all the boolean valued functions (that can be represented with your chosen hypothesis representation), not just one of them.

First I will give a mostly correct description of the algorithm (which makes it easier to grasp the overall concept), and then I will give the actual description of the algorithm.

Mostly Correct Description

Candidate-Elimination clamps down towards a hypothesis. It creates two boundaries:

  1. The most generic hypothesis (i.e. the one that flags the most training data as positive).
  2. The most specific hypothesis (i.e. the one that flags the least training data as positive).

It then looks at your training examples one at a time, and keeps moving these boundaries in response. If given sufficient training examples, it will converge to a single hypothesis (i.e. the general and specific boundaries will converge to the same hypothesis). Usually though, you will not be able to provide enough data to converge to a single hypothesis, so what do you do when you are left with multiple hypothesis? It is important to remember that all the hypothesis you are left with (the most specific, the most general, and everything in between), are correct. When you are classifying a new instance, you can feed it to all of your hypothesis and classify it according to a majority vote.

Here is the “pseudo code” for this “mostly correct” description of the algorithm:

  • Initialize 2 boundary hypothesis
    1. The most specific hypothesis
    2. The most general hypothesis
  • For each training example
    • If the training example is positive
      • If the most general hypothesis classifies it as negative
        • Generalize the general hypothesis a little bit
      • If the most specific hypothesis classifies it as negative
        • Generalize the specific hypothesis a little bit
    • If the training example is negative
      • If the most specific hypothesis classifies it as positive
        • Specialize the specific hypothesis a little bit
      • If the most general hypothesis classifies it as positive
        • Specialize the general hypothesis a little bit

Basically, what the above algorithm does is, picks one training example at a time, sees if the general/specific hypothesis incorrectly classifies it, and if so, will modify those hypothesis just a little bit.

  • If the most general hypothesis classifies a positive example as negative, clearly this hypothesis is not general enough, thus generalize it a bit
  • If the most specific hypothesis classifies a positive example as negative, it is a little too specific, so generalize it a bit
  • If the most specific hypothesis classifies a negative example as positive, then clearly it is not specific enough, thus make it a bit more specific
  • If the most general hypothesis classifies a negative example as positive, then clearly it is too general, thus make it a bit more specific

I hope that makes sense!

Actual Description

The actual algorithm deals with a few details that we overlooked. The actual algorithm initializes multiple “most specific” and “most general” hypothesis (not just 1 each). Additionally, it specified exactly how to generalize or specialize a “little bit” (taking into account the other hypothesis and removing any inconsistent hypothesis). These are really minor little details that you have to worry about if you are implementing the algorithm. The “mostly correct” description I give above is sufficient to gaining a conceptual understanding of the algorithm and the reasoning behind why the algorithm makes the choices it makes.