Building a Slackbot that uses the Google Cloud ML Natural Language API (and runs on Kubernetes)

Introduction

Many of us belong to some Slack communities. Slack has an API that makes it easy to add ‘bots’ to a channel, that do interesting things with the posted content, and that can interact with the people on the channel. Google’s Cloud Machine Learning APIs make it easy to build bots with interesting and fun capabilities.

Here, we’ll describe how to build such a bot, one that:

uses the Google Cloud ML Natural Language API to analyze channel content,
runs on Kubernetes to make it easy to deploy, and
uses the Botkit library to make it easy to interact with Slack.

The code for this slackbot is here.

Using the Cloud ML Natural Language API in a bot

The Google Cloud Natural Language API helps reveal the structure and meaning of text by offering powerful machine learning models for multiple languages– currently, English, Spanish, and Japanese.

You can use the NL API to do entity recognition (identifying entities and label by types such as person, organization, location, events, products and media), sentiment analysis (understanding the overall sentiment expressed in a block of text), and syntax analysis (sentence extraction and tokenization, identifying parts of speech, creating parse trees for each sentence, and more).

Our Natural Language (NL) slackbot uses the Google Cloud NL API in two different ways.

Entity detection

First, it uses the NL API’s entity detection to track the most common topics that are being discussed over time in a channel. It does this by detecting entities in each posted message, and recording them in a database. Then, at any time the participants in the channel can query the NL slackbot to ask it for the top N entities/topics discussed in the channel (by default, over the past week).

Sentiment analysis

Additionally, the NL slackbot uses the NL API to assess the sentiment of any message posted to the channel, and if the positive or negative magnitude of the statement is sufficiently large, it sends a ‘thumbs up’ or ‘thumbs down’ to the channel in reaction.

Running the Slackbot as a Kubernetes App

Our slackbot uses Google Container Engine, a hosted version of Kubernetes, to run the bot. This is a convenient way to launch the bot in the cloud, so that there is no need to manage it locally, and to ensure that it stays running. It also uses Google Container Registry to store a Docker image for the bot.

It’s useful to have the bot running in the cloud, and on a Kubernetes cluster. While you could alternately just set it up on a VM somewhere, Kubernetes will ensure that it stays running. If the pod in the slackbot’s Deployment goes down for some reason, Kubernetes will restart it.

Starting up the NL Slackbot

The README in the GitHub repo walks you through the process of starting up and running the slackbot. As part of the process you’ll also create a Slack bot user and get an authentication token.

If you think you want to run your slackbot for a while, follow the instructions in the README section on setting up a Persistent Disk for the bot’s database. That will allow the bot to be restarted without losing data. The README also walks you through how you can test your bot locally before deploying to Kubernetes if you want.

Once it’s running, invite the bot to a Slack channel.

The NL Slackbot in Action

Once your NL slackbot is running, and you’ve invited it to a channel, everyone in the channel can start to interact with it. For the most part, the NL slackbot will keep pretty quiet. Each time there is a post, the NL slackbot will analyze the entities in that text. It will store those entities in a database.

It will also analyze the sentiment of the post (understanding the overall sentiment expressed in a block of text). If the magnitude of the sentiment is greater than a certain threshold, either positive or negative, the bot will respond with a ‘thumbs up’ or ‘thumbs down’. It doesn’t respond to all posts, only those for which the sentiment magnitude is above the threshold.

Analyzing sentiment

(If you think that this aspect of the bot is a bit annoying, it is easy to disable, or change the threshold, by looking for where it is set here).

At any time, you can directly ask the bot for the top entities that it has detected in channel conversation (by default, the top 20 entities over the past week). You do this by addressing the bot with the words top entities. For example, if your bot is called @nlpbot, you would ask it like this:

@nlpbot top entities

Then, the result might look something like the following. (At least, if you have seeded your test channel with a combination of political news and Kubernetes posts :).

Asking the bot for top entities

What Next?

There are many ways that this bot could be developed further and be made more sophisticated.

As just one example, you could integrate the other Cloud ML APIs as well. You might add a capability that leverages the Cloud Vision API to analyze the images that people post to the channel. Then, each time someone posted a meme image to the channel, you could use the Vision API to do OCR on the image, then pass that info to the NL API.

You could also extend the NL slackbot to support more sophisticated queries on the entity database – e.g., “show me the top N PERSONS” or “top N LOCATIONS today”. Or, you could include the wiki URLs in the results for any entities that have them. This information is currently being collected, but not displayed. That might look something like the following. Note that “Trump” and “Donald Trump” are detected as referring to the same PERSON.

Including wiki urls in entity information