Voice Assistant in a Mobile App — Invoking Screens and Completing Forms with no Hands

Anna Prist
Chatbots Life
Published in
9 min readNov 6, 2020

--

How to teach an assistant everything the app can do? This is the second part of the tutorial. Originally published at Just AI blog

Last time we took the open-source habit tracking app Habitica and demonstrated how to add a voice assistant and create a basic scenario — the weather forecast — out-of-the-box.

Now let’s move on to the next level. We will learn how to invoke certain screens, create complex queries with NLU, and complete a form with voice.

Read the first part

Habitica is a habit tracking app with some gamification elements that helps to form good habits. It keeps your vital aims as daily habits and encourages you to stick to them.

So now we are going to teach our voice assistant, who lives inside the app, to create and fill in the tasks with voice instead of hands.

The logic of a voice interface

We’ll start you off easy — let’s see how the logic works. Say we want to use voice commands to open settings or characteristics windows. Open AndroidManifest and look for the relevant activities. You’ll find PrefsActivity that is responsible for settings; and FixCharacterValuesActivity responsible for the character’s characteristics. And to top it off, find the activity that opens profile and apps information — FullProfileActivity and AboutActivity.

According to documentation, we got to induce the client’s logic into class, inherited from CustomSkill. First off, we specify that we only react to the bot’s response that contains “changeView” in a response.action. In response.action we transfer the command where exactly to go, and depending on this, we invoke activity. But don’t forget to find the application context before then:

class ChangeViewSkill(private val context: Context): CustomSkill<AimyboxRequest, AimyboxResponse> {

override fun canHandle(response: AimyboxResponse) = response.action == “changeView”

override suspend fun onResponse(

response: AimyboxResponse,

aimybox: Aimybox,

defaultHandler: suspend (Response) -> Unit

) {

val intent = when (response.intent) {

“settings” -> Intent(context, PrefsActivity::class.java)

“characteristics” -> Intent(context, FixCharacterValuesActivity::class.java)//

“profile” -> Intent(context, FullProfileActivity::class.java)//

“about” -> Intent(context, AboutActivity::class.java)

else -> Intent(context, MainActivity::class.java)

}

intent.setFlags(Intent.FLAG_ACTIVITY_NEW_TASK)

aimybox.standby()

context.startActivity(intent)

}

}

This skill is added to assistant in the following way:

val dialogApi = AimyboxDialogApi(

“YOUR KEY HERE”, unitId,

customSkills = linkedSetOf(ChangeView()))

Trending Bot Articles:

1. Understanding DialogFlow ES and CX

2. Deploying Transformer Models

3. TensorFlow installation with GPU Support on Python 3.7 in Windows

4. Conversation Designers: who are they and what do they do?

Skill and intents

We use JAICF to create a cloud part of the skill (it’s an open-source Kotlin-based framework to create voice apps from Just AI). Start with the project fork https://github.com/just-ai/jaicf-jaicp-caila-template.

We’ll need it to understand the user’s queries and then send json to the device so that it will launch the corresponding activity.

Unfortunately, at the time of writing there was no integration of JAICP (Just AI Conversational Platform) with Aimybox (in-app voice assistant SDK to create voice interfaces), and it’s unfortunate because the linking would be so much easier — we could simply add one code line into one of two configuration files in a connections folder. But now we’ll make a new configuration file that we will use for tests.

Create an AimyboxConnection file

package com.justai.jaicf.template.connections

import com.justai.jaicf.channel.http.httpBotRouting

import com.justai.jaicf.channel.aimybox.AimyboxChannel

import io.ktor.routing.routing

import io.ktor.server.engine.embeddedServer

import io.ktor.server.netty.Netty

import com.justai.jaicf.template.templateBot

fun main() {

embeddedServer(Netty, System.getenv(“PORT”)?.toInt() ?: 8080) {

routing {

httpBotRouting(“/” to AimyboxChannel(templateBot))

}

}.start(wait = true)

}

In order to use the NLU functionality, we add the Caila NLU service. So, we register at app.jaicp.com, find the API key in settings, and enter it at conf/jaicp.properties. Now we invoke intents that we’ve entered at app.jaicp.com right in the scenario.

Actually, you can use any other NLU provider or just use some regular expressions, but if you want to make it nice and handy for the user, you better employ NLU.

First of all, we set up intents. We need to know when our user needs to go to a specific section of an app. So, we set up entities, adding a concrete entity for each section of an app; we add synonyms and enter how we’re going to recognize it at the app level (settings, characteristics, etc. from the code lines above).

I got it like this:

Now we write down the way we expect to meet this entity in a user’s speech. To do it we create an intent and write down phrase variations. Besides, we really need to know where exactly we need to go, so, we have to mark views entity as Required.

That’s what it looks like:

We’re going to use the name to refer to an entity in a JAICF code. To make sure that all intents are recognized as they should, you can enter a few test phrases using the test button.

It seems okay:

Scenario: invoking a skill

As a precaution, I’ve erased all the standard states, but I left catchAll only — this is something that bot says when it doesn’t understand you. Create a changeView state and enter the intent you just created at JAICP into activators. Also, write down the logic at actions — we got to add all the info into the bot’s response and into the standard reactions of the Aimybox channel in order to make a transition.

Now get that views slot from what Caila recognized, put things we entered before into action so that Aimybox knew which skill it has to invoke. Then we put the recognized slot to intent. To make it neat we add «Wait a sec…».

state(“changeView”) {

activators {

intent(“changeView”)

}

action {

reactions.say(“Wait a sec…” )

var slot = “”

activator.caila?.run {slot = slots[“views”].toString()}

reactions.aimybox?.response?.action = “changeView”

reactions.aimybox?.response?.intent = slot

}

}

You better take skills away to a separate skills pack with a class file for each skill. Now you got a few options. You can deploy bot on-device via ngrok, or you can use Heroku. Next, you take the link that you got and put it into app.aimybox.com through the custom voice skill building — in an Aimylogic webhook URL box. Write down a few skill samples: go to settings, go to info.

After you’ve added a channel you can test the return in order to get the bugs right in a console, using Try in Action button.

Or you can enable the skill with no console or any additional skills — as described here. Now we only have to run it and see whether everything’s right.

It is!

Okay, now the tricky part.

Filling the tasks with voice

I want to fill-in the task, make sure everything’s right, and fix some small mistakes with one voice command. And only after that, I will finally create it.

To achieve this, we create a second skill. We will differentiate between the first and this one through response.action == “createTask. And we will specify the task type through response.intent.

When you explore all the app’s sources you see that awards, dailies, habits, and tasks are created via TaskFormActivity, but using different types. Let me show that logic.

class CreateTaskSkill(private val context: Context): CustomSkill<AimyboxRequest, AimyboxResponse> {

override fun canHandle(response: AimyboxResponse) = response.action == “createTask”

override suspend fun onResponse(

response: AimyboxResponse,

aimybox: Aimybox,

defaultHandler: suspend (Response) -> Unit

) {

val intent = Intent(context, TaskFormActivity::class.java)

val additionalData = HashMap<String, Any>()

val type = response.intent

additionalData[“viewed task type”] = when (type) {

“habit” -> Task.TYPE_HABIT

“daily” -> Task.TYPE_DAILY

“todo” -> Task.TYPE_TODO

“reward” -> Task.TYPE_REWARD

else -> “”

}

Each task, including awards, has a name and a description. Also, tasks have complexity and sentiment. Let’s see how to send the data there. We’ll do that via response.data, and we’ll have some default values in case it’s empty. Then we’ll bundle the data and create a task with that bundle. We’ll also need to handle the bundled data in onCreate function of TaskFormActivity.

// Inserted code for voice activation

textEditText.setText(bundle.getString(“activity_name”)) // presetting task name

notesEditText.setText(bundle.getString(“activity_description”)) //presetting task description

if (bundle.getBoolean(“sentiment”)) { // presetting task sentiment

habitScoringButtons.isPositive = true

habitScoringButtons.isNegative = false

} else {

habitScoringButtons.isNegative = true

habitScoringButtons.isPositive = false

}

when (bundle.getString(“activity_difficulty”).toString()) { // presetting task difficulty

“trivial” -> taskDifficultyButtons.selectedDifficulty = 0.1f

“easy” -> taskDifficultyButtons.selectedDifficulty = 1f

“medium” -> taskDifficultyButtons.selectedDifficulty = 1.5f

“hard” -> taskDifficultyButtons.selectedDifficulty = 2f

else -> taskDifficultyButtons.selectedDifficulty = 1f

}

Now we will configure recognition and the JAICF skill with Caila NLU.

Preparing Caila: get an entity to recognize tasks’ type, difficulty, and sentiment (to set an example, I got it with patterns — you can pick Patterns instead of synonyms on the left side of the form)

Don’t forget to enter the data that we will process on the client side: habit, pattern, etc. There can be any name and description, so we create Name and Description entities, where we write in a regular expression, matching any word. For now, our name and description would have one word only.

Create an intent:

Indicate that we need task_type and complexity. We can set to Required both name and description, so when a user won’t mention any of these, a bot will specify the slot.

Now we write down different variations that may be used as a name or a description with a type (their order, lacking a name or description). And perfection knows no limits, but to do the minimum a few templates from above is enough.

Also, to show an example here, I use pattern language, which may be changed with a button push (on the left from Enter).

@ — patterns and regexp, — examples and semantic similarity.

Now the JAICF scenario:

state(“createTask”) {

activators {

intent(“createTask”)

}

action {

val taskType = activator.getCailaSlot(“taskType”).asJsonLiteralOr(“”)

reactions.say(“Wait a sec…”)

reactions.aimybox?.response?.action = “createTask”

reactions.aimybox?.response?.intent = taskType.content

reactions.aimybox?.response?.run {

data[“taskName”] = activator.getCailaSlot(“taskName”).asJsonLiteralOr(“”)

data[“taskDescription”] = activator.getCailaSlot(“taskDescription”).asJsonLiteralOr(“”)

data[“taskSentiment”] = activator.getCailaSlotBool(“taskSentiment”).asJsonLiteralOr(true)

data[“taskDifficulty”] = activator.getCailaSlot(“taskDifficulty”).asJsonLiteralOr(“easy”)

}

}

}

private fun ActivatorContext.getCailaRequiredSlot(k: String): String =

getCailaSlot(k) ?: error(“Missing Caila slot for key: $k”)

private fun ActivatorContext.getCailaSlot(k: String): String? =

caila?.slots?.get(k)

private fun ActivatorContext.getCailaSlotBool(k: String): Boolean? =

caila?.slots?.get(k)?.toBoolean()

private fun String?.asJsonLiteralOr(other: String) = this?.let { JsonLiteral(this) } ?: JsonLiteral(other)

private fun Boolean?.asJsonLiteralOr(other: Boolean) = this?.let { JsonLiteral(this) } ?: JsonLiteral(other)

Connect the intent via activator, get the type from the slots, and put it into the intent. Put the name and description into data, and don’t forget to mark action, so that Aimybox on the client side knew, what skill it has to choose.

Now we make sure it works. And it does! Let’s turn on the volume and test it.

Of course, this is a tech demo, and for sure, in terms of product, you can think of more useful scenarios.

We will discuss that in our next articles!

Repository with a JAICF skill

Repository with an Aimybox code

Don’t forget to give us your 👏 !

--

--

I write of great minds and smart machines that change the world for a better future