You have an app and you want users to be able to interact with it using natural language. For example, a user might type “draw a red circle in the center” and your app would do it. How do you achieve this?

  • Identify a set of intents in your application.
  • Come up with example sentences for each intent.
  • Feed the example sentences along with their intent/parameters to a ML model.
  • Ask the model to predict the intent/parameters for a new sentence.
  • Pass the predicted intent/parameters to your app’s DoIntent() function.

                   IdentifyIntents

                          │
                          │
                          ▼

                   ComeUpWithExampleSentences

                          │
                          │ TrainModel
                          ▼
                   ┌─────────────┐
                   │             │
    UserInput ───► │  ML Model   │ ───► Intent/Params ───► DoIntent(Intent,Params)
                   │             │
                   └─────────────┘

Identify Intents

The first step to getting a natural language interface is to identify intents in your application. An intent is something that a user may want to do. Here are some examples:

  • DrawRectangle(position, width, height)
  • DrawCircle(position, radius)
  • ChangeColor(color)

Notice that the intents have parameters, which specify details about the intent.

Come Up with Examples

For each intent, come up with sentences that a user might say to execute that intent. Here are some examples:

  • DrawRectangle intent
    • “draw a big red rectangle at the center”
    • “can you create a small blue rect at the top right?”
  • ChangeColor intent
    • “change the color to green”
    • “make it yellow”

For each example sentence, you need to identify the parameters. For example, in the sentence “draw a big red rectangle at the center”, the parameters are:

  • position: center
  • width: big
  • height: big
  • color: red

Feed Examples to ML Model

You now want to train an ML model on your example sentence so that it is able to map from natural language (sentences) to intents/parameters. So the input to the model is a sentence and the output is both the intent and the parameters (as extracted from the sentence).

You have a couple of options here. You can train your own model from scratch, you can start with a base model and fine-tune it on your data, or you can start with a LLM that is already very good at language and can predict intents/parameters for you in a prompt, given just a few examples. For this last case, you just pass a few example sentences as part of your prompt and then towards the end of your prompt, ask it to predict the intent/parameters for a new sentence (user’s input). Here is an example prompt:

1. User says: "draw a blue square on the left side" 
   Intent: DrawRectangle
   Parameters: Position: left, Color: blue

2. User says: "change color to green"
   Intent: ChangeColor
   Parameters: Color: green

[Actual User Input]
User says: "draw a red rectangle in the center"
Intent: ?
Parameters: ?

Ask Model to Predict

Regardless, you now have a model that you can pass user sentences to and get the corresponding intent/parameters. Whenever a user asks for something, pass it to your model and get the corresponding intent/parameters. Then pass these to your app’s DoIntent() function (which you should implement).

The DoIntent() function is a very general function that takes enumerated intents (along with their parameters) and executes them by calling various other functions in your app. Here are a few example uses of DoIntent():

  • DoIntent(intent=DrawRectangle, position=center, width=big, height=big, color=red)
  • DoIntent(intent=ChangeColor, color=green)

Error Handling

Be aware that the ML model’s predictions can be wrong or unexpected. The portion of your app that takes input/parameters from the model should have good error handling. For example, what if the model predicts an intent that your app doesn’t support? What if the parameters are not valid? Etc.

That’s it! That’s all there is to it! Really simple, and adds a really cool/useful thing to your app.