Before we start, remember that in the field of machine learning, hypothesis and function are synonyms.

Find-S is a machine learning algorithm to find a boolean function that matches your training data.

The algorithm is very simple:

  • Start with an initial hypothesis.
    • Make sure your initial hypothesis is as specific as possible (i.e. all your training examples are predicted as negative by your hypothesis).
  • For each positive training example in your training examples:
    • If it is predicted as negative by your hypothesis:
      • Generalize your hypothesis just enough so that the training example is now predicted as positive.

That’s it!

How do you generalize your hypothesis “just enough”? Find which attributes of your hypothesis disagree with the training example, and generalize just those attributes.

Some notes on Find-S:

  • Is a concept learning algorithm (in other words, can only find boolean functions).
  • Your data’s features must be discreet (i.e. not continuous).
  • Does extremely bad with noisy training data
    • A single bad (i.e. misclassified) training example can lead to a severely poor performing hypothesis.
    • Doesn’t handle missing attributes.
  • Restriction biased. I.e. it only searches the hypothesis that can be represented by your chosen hypothesis representation. I.e. only contains the hypothesis that can be represented with your chosen hypothesis representation.