I have long wanted to build a multilingual chatbot. I felt conversation bots in language learning apps or online assistants could do a much better job. With close to zero knowledge about machine learning and NLP in particular, I was looking for a tool that would provide guidance. Yet, I wanted to stay in control as much as possible. After some experiments I came across Rasa, an open source machine learning framework for text and voice automation.

Rasa was a perfect match for the task at hand. The framework takes a pragmatic approach and eliminates a lot of complexities of natural language processing. There is an SDK with rich API, great documentation and fairly large community. A lot of great articles have already been published on how to use Rasa as a data scientist. I wanted to have my say on dealing with some of the pitfalls from a developer’s point of view. In a series of posts, I walk you through building a not-so-trivial chatbot that is deployable on Heroku.

TL;DR … I know, I too dislike having to wait. Especially, when I hurry to find a solution and need to move on. Here is the deal. You always find the link to the repo and to the running bot instance at the beginning of each of my posts. Don’t read my articles, if you are pressed for time, I don’t care. Fork my project instead and leave a comment, that’ll do for me. Thank you.

The Objective

Here is what we are looking at in terms of functionality, the chatbot:

  • is designed as multilingual and it should be easy to add a new language.
  • supports several simple intents, such as greetings and purchasing a cup of coffee.
  • recognises custom entities, such as coffee types
  • knows, at least to a degree, how to tackle unexpected input.

Finally, language models should be independent from each other and can be easily replaced.

I appreciate open source frameworks like Rasa. They make machine learning and artificial intelligence more accessible without sacrificing the freedom of design choices. I believe that Rasa strikes this balance really well and deserves credit for coming up with opinionated expertise on ML models and excellent documentation. Also, their API makes sense to me.

In this very first part of the series I explain the basic terminology. I only want to cover the bare minimum you need to follow this tutorial. Hope the links below make the navigation easier.

Natural Language Processing

NLP studies algorithms allowing to analyse and process large texts and extract information about various features of a human language. Think a spell checker, Google Translate or any other application that simply would not work without understanding the context.

An NLP algorithm analyses a piece of text and identifies:

  • what does it consist of (tokens – words, punctuation)
  • what relationships there are among tokens (subject – verb – object)
  • base forms of words (verbs – infinitive, nouns – singular etc.)
  • sentence start and end
  • what entities there are (people, places, organisations etc.)

Natural Language Understanding

NLU brings us much closer to an intelligent (see Turing Test) conversation between a machine and a human. NLP takes care of the basic text analysis, whereas NLU applies a linguistic point of view:

  • Syntax (grammar)
  • Semantics (meaning)
  • Pragmatics (purpose)

Mariya Yao from TopBots has an excellent in-depth post about differences between NLP and NLU.

As you can see, NLU is at the core of a chatbot’s brains. Rasa incorporates a standalone NLU server, where you can train your model and interact with it.

Striving for Accuracy

A chatbot fails to recognise user's intent.
Chatbot fails to recognise the intent – credit: 3 Things Your Chatbot Fails At (But Shouldn’t) by Florian Treml

The example is actually misleading. The bot did recognise the intent well, e.g. buy fruit. However, there always are corner cases, such as when the user changes her mind. The bot will get the intent wrong at some point. Granted. The accuracy is determined by an NLP implementation and how much freedom you leave to the user.

Model Design Choices and Tradeoffs

There are several popular implementations of NLP, each having distinct advantages and shortcomings. Therefore you need to be aware of which one to choose over the other. There are implications on predictions accuracy and ease of training of your data model.

Mitie and Spacy are very different libraries from each other: the first one uses more general-purpose language models, and therefore very slow to train, while Spacy uses more task specific models, and is very fast to train.

NLP implementation differences – credit: Chatbots vs Reality by Gidi Shperber

MITIE

MITIE is an open source project by MIT and processes large data sets at high speed of tens of thousands words per second. It relies on a more general machine learning algorithm called Support Vector Machine (SVM).

In a nutshell, SVM falls into a category of supervised learning. Imagine a set of manually picked examples categorised by a researcher. The algorithm then takes new (unseen) examples and tries to correctly categorise them on its own. The point is, that the algorithm tries to generalise from the training data (the resolved examples presented to it). As a result, if enough of training is provided, the machine gains capability to reason about categorising of unseen data.

Supervised learning - describing a cat.
Supervised Learning – credit: Machine Learning for Beginners by Divyansh Dwivedi

The advantage of SVM is that once a boundary is established, most of the training data is redundant. All it needs is a core set of points which can help identify and set the boundary. These data points are called support vectors because they “support” the boundary.

The gist of SVM algorithm – credit: When do support vector machines trump other classification methods by Bala Deshpande

MITIE, being a generic tool, can be quite slow to train with large data sets.

spaCy

spaCy is an open source NLP library with many useful features. Among others, there are linguistic annotations, shareable vocabularies or efficient serialisation. What matters is the fact that Rasa supports spaCy’s pre-trained language models. Each word of user’s input is represented as a word embedding. Consequently, Rasa is able to identify similar words (syntax, semantics), which helps with intent predictions. Therefore, you don’t have to do a lot of training on your own. Bear in mind that the training is (human) language specific, so check if it applies to your situation.

NLU Pipeline

NLP / NLU processing entails specific steps. Rasa compares different configurations (policies). The comparison assesses which NLU model best responds to yet unseen inputs. Stories (conversation scripts) define boundaries for your chatbot and serve as a baseline for the comparison. In other words, the stories bind intents to the desired chatbot actions. So, the comparison is based on the ratio of correctly predicted intents.

NLP pipeline schema.
NLP pipeline transforming raw data to automated issue resolution – credit: Improving Uber Customer Care with NLP by Huaixiu Zheng, Yi-Chia Wang, and Piero Molino

Intent Classification

Typically, users interact with a chatbot to achieve a specific goal. The chatbot is constantly trying to intercept user’s inquiries and drive the course of action.

Intent classification is a vital part of NLU, it is what turns chatbots into helpful assistants. I recommend reading Akshat Jain’s article on what it entails to come up with a usable prediction model (hint: a lot of work).

The process of intent classification.
Intent classification using supervised machine learning – credit: Topic and Intent Classifier from Scratch by Himang Sharatun

Rasa supports a range of intent classifiers. Each of them has unique advantages and shortcomings. This post explains in depth how the classification works.

Keyword Intent Classifier resolves intents by simply looking for certain keywords in the user input. Even though it is sufficient for recognising basic interactions, such as greetings, you are better off not using it.

MITIE Intent Classifier leverages MITIE’s sophisticated text categorizer.

Sklearn Intent Classifier is another popular choice with Rasa. SciKit Learn is an open source Python library for data mining and analysis. Like MITIE, Sklearn builds upon SVM.

Entity Extraction

Entities are real life objects, e.g. people, companies, places etc. You typically define custom entities that make sense for your use case. Then you train your bot to recognise them.

Named Entity Recognition (NER), a.k.a Entity extraction, is a process of recognising entities in raw data. As long as your object can be named (China is a country, Colombo is a capital etc.), it can also be recognised and extracted.

There are many more details, but I feel there is already a lot to digest. I hope it all makes sense. Let me know your thoughts and let’s catch up in the next session where we start building the project.


Tomas Zezula

Hello! I'm a technology enthusiast with a knack for solving problems and a passion for making complex concepts accessible. My journey spans across software development, project management, and technical writing. I specialise in transforming rough sketches of ideas to fully launched products, all the while breaking down complex processes into understandable language. I believe a well-designed software development process is key to driving business growth. My focus as a leader and technical writer aims to bridge the tech-business divide, ensuring that intricate concepts are available and understandable to all. As a consultant, I'm eager to bring my versatile skills and extensive experience to help businesses navigate their software integration needs. Whether you're seeking bespoke software solutions, well-coordinated product launches, or easily digestible tech content, I'm here to make it happen. Ready to turn your vision into reality? Let's connect and explore the possibilities together.