Our Tech Blog

Explaining Rasa NLU Intent Classification

Zvi Topol | Nov 1, 2018 | Natural Language Understanding, Explainable Machine Learning, Visualizations, Conversational Analytics

In our first blog post we are going to explore a topic in the intersection of natural language understanding (NLU), machine learning explainability and visualization. In particular, we are going to revisit an article we published on MSDN magazine earlier this year. The article discusses how to use explainable machine learning techniques to improve intent classification for natural language understanding, an important building block for chatbots and voice interfaces. The article can be accessed here.

In a nutshell, the MSDN article gives some background about NLU and explains how to improve LUIS (Microsoft Natural Language Understanding Service) intent classification. Here we will show how to apply the same techniques to Rasa NLU, which is an open source natural language understanding tool developed by Rasa(www.rasa.com), with functionality similar to LUIS.

For instructions about how to install Rasa NLU, please refer to Rasa's documentation here.

The MSDN article provides a concrete example from the financial technology space that includes two types of intents. One intent is called PersonalAccountIntent and represents user requests regarding personal accounts. The other intent is OtherServicesIntent and represents other user requests, for example, questions about possible mortgage applications.

We will need to train a machine learning model to distinguish between those two intents. There are a couple of ways to do this in Rasa. The simpler way to do that is to create a training set with a few examples for each intent and to use a simple machine learning pipeline. Using the same examples from the MSDN article, we create the following file that we save under the name rasa_data.text:

## intent:personal_accounts_intent

- what is my savings account balance
- what is the latest transaction in my checking account
- i would like my savings statement to be sent again
- have i received my april salary yet
- when was the last cell phone auto pay processed
- what are annual rates for my savings accounts
- what is the balance in my checking account

## intent:other_services_intent

- i would like to get assistance about mortgage rates
- whom can i speak with regarding mortgages
- what is the annual rate for the one-year savings account
- what terms do you offer for mortgages
- who is responsible for mortgages
- what are annual rates for savings accounts
- how are your mortgage rates compared to other banks
Rasa NLU allows you to define machine learning pipelines. The simplest pipepline would be to use sci-kit learn. There are also a few languages available you can choose from. Here we focus on English. Therefore, for our case here, we will use a very concise config file, which will we name nlu_config.yml and is simply defined as:

language: "en"

pipeline: "spacy_sklearn"
Now you can train your model using the following Rasa python script:

python -m rasa_nlu.train --config nlu_config.yml -d rasa_data.txt
When executing this script, Rasa NLU will deal with the heavy lifting of breaking the different user utterances you have defined in rasa_data.text into separate words (a.k.a bag of words model). It will then represent each word as a vector (using a pre-trained word2vec model). For each utterance, Rasa will average the different vectors representing the words in the utterance, such that you get a vector representing each utterance. It would then use this representation to train a machine learning algorithm to classify the intents (also defined in rasa_data.text). Trained models are stored by default in a folder called models. You can take a look under that folder and you can find a project named "nlu" - another folder under which you will find a model folder with a name "model_DDDD-TTTT" where DDDD is the date and TTTT is the time of the training. Inside that folder you will be a representation of your trained model inside a python pickle file.

Now you would want to score the model, that is, use it to identify intents of new utterances using the trained model. There are a few ways to do that. The simplest way for our example would be to start a Rasa local server thusly:

python -m rasa_nlu.server -c nlu_config.yml --path models
The default port for the Rasa server is 5000, so now you can use your browser to score the model you have trained on a new utterance "what mortgage rate do you offer for 30 yr loans?" in the following way:

http://localhost:5000/parse?q=what mortgage rate do you offer for 30 yr loans?&project=nlu&model=model_DDDD-TTTT
This should return the followig JSON result:

  "intent": {
    "name": "other_services_intent",
    "confidence": 0.8726195067816652
  "entities": [],
  "intent_ranking": [
      "name": "other_services_intent",
      "confidence": 0.8726195067816652
      "name": "personal_accounts_intent",
      "confidence": 0.12738049321833478
  "text": "what mortgage rate do you offer for 30 yr loans?",
  "project": "nlu",
  "model": "model_DDDD-TTTT"
One important difference to notice from the results returned by Microsoft's LUIS is that LUIS automically creates a default catch all intents for utterances that are not likely to fall under the two defined intents. So, for example, if someone asks about a completely unrelated topic, your software will be able to recover gracefully. You will have to define this yourself when using Rasa NLU.

Now we can go ahead and apply the LIME (Local Interpretable Model-Agnostic Explanation) algorithm on top of our Rasa NLU trained model. I will refer you to the MSDN articile to get some more information about LIME and how to use it to derive insights from your machine learning classifier so that you can improve it. To do that, we will need to modify a little bit the code included in the MSDN article. The following python code snippet should fit the bill.

import requests
import json
from lime.lime_text import LimeTextExplainer
import numpy as np

def call_with_utterance_list(utterance_list) :

    scores=np.array([call_with_utternace(utternace) for utternace in utterance_list])

    return scores

def call_with_utternace(utternace) :

    if utternace is None :
        return np.array([0, 1])

    app_url ='http://localhost:5000/parse?q='
    project_model = '&project=nlu&model=model_DDDD-TTTT'

    r = requests.get(app_url+utternace+project_model)

    json_payload = json.loads(r.text)

    intents = json_payload['intent_ranking']

    personal_accounts_intent_score = [intent['confidence'] for intent in intents if intent['name'] == 'personal_accounts_intent']

    if len(personal_accounts_intent_score) == 0 :
        return np.array([0, 1])

    score = personal_accounts_intent_score[0]

    complement = 1-score

    return (np.array([score, complement]))

if __name__== "__main__":

    explainer = LimeTextExplainer(class_names=['PersonalAcctIntent', 'Others'])
    utterance_to_explain = 'what are annual rates for my savings accounts'
    exp = explainer.explain_instance(utterance_to_explain, call_with_utterance_list, num_samples=500)
Before running the code, make sure to change the model folder name model_DDDD-TTTT to the model folder name created for you by Rasa NLU. The python program will use LIME to generate an HTML file that will visualize the attribution of the different words of the utterance "what are annual rates for my savings accounts". The following is a screenshot of the HTML.

Interpreting the visualization, you can infer that Rasa NLU will classify that utterance as 'OtherServicesIntent' with confidence score of 0.55. You can also infer that the words "rates" and "annual" were the major contributors to the classification decisions, and that if you were to remove them, the confidence score for 'OtherServicesIntent' will approximately decrease by 0.12 and 0.11 respectively. Note that those results are different, but quite directionally close to the results reported in the MSDN article, which used a model trained by LUIS. I will refer you to the MSDN article for more details about how to interpret the results and how to make use of them to improve your classifier. There, in addition to LIME, you will also find details about how to leverage the open source tool ScatterText for the purpose of improving your NLU classifer.

I hope you enjoyed reading our very first blog post and found the material useful.
Please contact us at info@muyventive.com for feedback and suggestions.