Siri Shortcuts analysis

Siri. The voice assistant who popularized voice assistants. A product purchased by Apple in 2010 and implemented in iOS 5 in 2011 along with the release of the iPhone 4s. Since then Siri has evolved quite a bit, has had 4 versions that have re-written its code since 0, has been implemented using Machine Learning, its capabilities with capabilities using the SiriKit library that integrate third-party applications from third parties and has undergone an interesting evolution in all its aspects. The latter will come with iOS 12 in a mix between voice commands and a system that should be a system of a more accurate form of our activity in the apps. Let’s look at what it’s going to consist of.

-Siri was purchased by Apple in 2010 and was implemented on iOS 5 in 2011 along with the release of the iPhone 4s.-

We do not deceive anyone if we say that today Siri’s popularity is not at its best (at least in the most technological circles). Other solutions such as Google Assistant or Amazon Alexa have managed to advance Siri in the race of putting a voice assistant in our homes. As such, Siri has been living on our smartphones or tablets longer than any of them. However, Siri has been relegated to the background as a «minor product». Nothing more unfair, from my point of view.

It is true that the King right now is Google Assistant in terms of product quality, and that Siri has limitations of use and functional that other systems have been able to take advantage of to expand their own functionality in order to improve services, functionalities or more integration options.

That’s why Apple this year with the release of iOS 12, following the path of Google Assistant actions or Alexa skills, has launched a new possibility of use and integration based on the Workflow app that bought in March last year and the activities that Siri itself, like AI on our iOS devices, records from each of our apps. Because before we go any further we have to start changing the concept of Siri: this is not our voice assistant, it is the assistant of the whole device. It is the one who suggests activities that we normally do, who advises us what to type on the predictive keyboard or who finds the photos by descriptions or content. Siri is the AI focused on our iOS usage.

-Very important: Siri is not our voice assistant, it is the assistant of the whole device. It’s the AI that knows what we’re doing and suggests things based on our activity.-

So let’s explain what this new integration is all about, how it works, and how developers can tailor their apps to better integrate Siri into normal use of our devices.

SiriKit, that’s how it works right now

SiriKit is the name of the current library that allows you to implement Siri to interact with our apps, based on the use of domains that create intentions. The domains available right now:

  • Lists, to create reminders or tasks in any app that has these functions.
  • Visual codes, to show QR codes that are in apps and that can be displayed by Siri without opening the apps.
  • Travel order, to order a trip through a taxi or derivative transport service.
  • Messages, to send text messages in any app that has this service.
  • Photo search, to allow a photography app to offer its own database to Siri and search for content in it.
  • Payments, to send or request payments in apps of this type through Siri
  • Voice over IP, to initiate voice or video calls in an app that supports these functions from Siri by asking for it by voice.
  • Exercises, to start, pause or finish the exercise routines we do.
  • Reservations in restaurants, to make a reservation through an app that offers this service.
  • Weather and radio, created for CarPlay, which allows you to control the air conditioning or heating options of the car, as well as the radio and audio player settings.

Of all these, lists and messages also works with the HomePod. If we wanted to send a message using Siri the domain would be «Message» and the intention that would be created would be to «send a message». These are the current options for integrating Siri with third-party apps.

Let’s explain how it works so that we see the complexity after using Siri and why it’s not so easy for any app to integrate with a voice assistant. To integrate Siri into an app, you have to choose which domain you want to use. Next, we will choose the intentions that are what generates the actions and that have parameters.

Let’s say to Siri, «Hey Siri, send a whatsapp message to the Family group saying I’m in a jam and I’ll be there in 10 minutes.» Siri has to translate this into intentions and parameters, acting accordingly by dividing the natural language and capturing into semantics, what we want to do and what is the content of our request.

-SiriKit is the library that allows the wizard to be integrated into third-party apps and that currently works by specific domains, which have been gradually expanding during these years. Shortcuts aims to give a boost to its evolution.-

But look, the way we can say the name of the app changes from one language to another, so the developers of the app have to register in the system different forms of phonetic pronunciation of the name, to be recognized by Siri : such as «g-asap», «g-asá», «uasap», «uatsap», «uotsap»… Each app has to record the phonetics for each language so that Siri recognizes the name of the app itself independent of the way the user pronounces its name.

Once the app is recognized, Siri begins to unearth the commands sent: the parameters. At this point you know that you have to send a message and that it will be done by Whatsapp. The first thing is to see the parameters of the intent, which must recognize the recipient of the message and the content of the message. Those are the parameters that are declared for this intent. To do this the conjunction «to» is key because it tells us who is the recipient: «Family group» because from there it uses the transition «saying that» that the system dismisses as content but that allows it (in the natural language processor of the system) to know where there is the subject to which the message is sent and what the content of the message is.

Siri now asks Whatsapp: do you have any recipients in your contact book called «Family Group»? Obviously, the app, when receiving this question, has to understand that by putting the word «group» is not looking for an individual contact but a group of contacts. So check if you have any created groups called «Family». If it exists, perfect, Whatsapp replies to Siri that it does have someone like that and now tell them what to send.

But if invoking Siri we have said only: «Send a message by Whatsapp to the Family group» and there is missing data to complete the call (the message parameter), Siri, knowing that a data is missing and knowing what it is and the domain and intent where we are, will ask for it. To do this you will use an action pre-created by SiriKit requesting the message parameter in which Siri will say, «What do you mean in the message?» or something similar. If the order to Siri was also sent the message (as we did at the beginning) nothing will be asked of the user and continued.

But what if Whatsapp doesn’t recognize the recipient Siri has told you? Two things can happen: o Whatsapp says that there is no one matching that criterion and therefore Siri gives an error and says that it cannot send the message because it cannot find the recipient (error generated by Whatsapp invoking actions in Siri through the response that d a) or Whatsapp may propose alternatives to Siri to offer them. Imagine that we have 2 groups: «Political Family» and «My Family». Whatsapp would send those suggestions, Siri would show us wondering and we would choose if one of them is the one we are looking for.

Recognized the recipient Siri sends the content «I’m in a jam and I’ll be there in 10 minutes». That’s the parameter of the content to send. But because the action is defined as an action that requires approval before it is executed, Whatsapp returns its own view of content through an app extension (or a summary, depending on the case for Siri to format it) and we are asked for confirmation before rea To action. If we say yes, the message is sent.

As we can see, the integration is not simple, and we have only put the case of a domain where Siri and our app communicate with each other and ask for things to bring the request to fruition. Each domain and each intent has its own start, intermediate, and end steps, number of parameters to require, and communication between Siri and the app. Nothing simple.

Given this complexity, that’s why Siri domains are not free and we can’t integrate them into any app. So, that’s why what Apple has been looking for is a secondary solution by offering the new Siri Shortcuts (or Shortcuts) that allow you to invoke actions through voice commands. Here we are not talking about an assistant as such: it is much more like creating macros (like the ones that we can do in Automator on the Mac or the ones we created in IFTTT or the aforementioned Workflow app that Apple bought) to which to associate specific voice commands. There is no artificial intelligence or natural language understanding processes here. They are simply, command execution.

-Given the complexity of integration with Siri, the domains to be used are not free and are restricted to very specific possibilities.-

However, we cannot think that this function is impractical or powerful for a very simple reason: because there is Machine Learning by suggesting actions that we do in the apps and because its ease generates a world of customization possibilities to use Siri. Let’s take a closer look at how they work.

Siri shortcuts from apps

Apple defines Siri shortcuts as the ability to expose our app’s capabilities to Siri. And to do this we have 3 different options: one that can already be done without the apps adapting in any way, another that requires the apps to adapt and a third that is the new Shortcuts app, which is not yet available in iOS 12 betas.

The first possibility depends entirely on how the system understands and manages our activity log in the apps through the NSUserActivity API. This API or library is a way to inform the system of what our app does at all times so that the device’s own AI, through the Siri, Spotlight or Handoff Suggestions knows about our use, in addition, can recommend us to do things in a way Automatic. In the case of Handoff, to start an activity on one device and continue it on another.

 

-Siri shortcuts are based on the record of our activity carried out by the device from the apps, which is now done through the NSUserActivity library.-

This new functionality will allow the activities that our app creates to be eligible for prediction and search, and with it the system will be able to offer them for us to record, record a command and when invoked said command, our app is invoked to to carry out that activity. We have to understand that in this case they are simple voice commands that execute something just like we would if we clicked on the Iasthe suggestions icon, for example. For practical purposes it is the same and this does not require much work on the part of the developers, especially those who already have implemented a control of the user’s activities.

They are something like shortcuts, hence the word, that allow us to take concrete actions. For example: if it is usual that we write with a specific person from Whatsapp, the system will offer it as a shortcut. This way I can record the shortcut «To my wife» and that will make when I say «Hey Siri, to my wife», I open Whatsapp and just do it in the conversation I have with her to start writing. That’s it: a shortcut.

And why is it a shortcut? Because one of the activities that records Whatsapp of our use in the system, is when we access a conversation with a group or person. Not the content of it, but simply that we have entered to talk to such a person in a chat with her.

For this functionality to make sense, apps that don’t record their usage activities will obviously have to adapt and use this feature.

Siri’s shortcuts through intentions

The second way to integrate Siri shortcuts is through the intentions of the SiriKit library itself, where Apple now allows us to create our own fully customized intentions that allow answers with content created by the app itself. Basically, we can extend Siri’s capabilities beyond the domains we discussed at the beginning. But watch out: Intents are only used for specific activities that we then associate with a voice command. It does not allow us to use Siri more widely.

What we have to understand is that this siri intent option is one way that the system understands what the app does and can offer concrete actions within it through voice commands that we record. But in this case, instead of being actions as we saw in the first option that open the app (for example) are all actions in the background, without having to invoke the app as such.

If our shortcut is going to be associated with a system intent that already exists in any of the domains that we discussed at the beginning, we must use the domains and intents that already exist in it. So if I’m going to create an intent to send messages (there are also to play media content) I have to use the one that Siri already has instead of creating a new one. If we use the intentions already in the system, the parameters are already given and we just have to say where our data fits. But if I want to do something else, I can create my own custom intent outside of any domain.

-Xcode 10 allows us to create Siri intent files: its goal is for the system to know more precisely what activities we do in each app.-

In Xcode 10 we can create a Siri intent definition file that we define for what we want to create for our app. Although at this point it is important that we remember again that we speak of Siri as the device assistant, not as the voice assistant. The first thing we have to do is choose the category, which will give Siri the necessary contextual information to know what will be done and can answer in the same context. Type: «ask X» and Siri knows to answer «I’ve asked for it.»

The categories that Apple puts at our disposal allow us a good number of options, almost every one we can think of, all based on action verbs: make, perform or go, as generic verbs. Then classified into different sections we have: view, open, order, order, buy, start, browse, share, publish, send, create, add, search, filter, download, catch, fix, require, change or check in. We can even configure whether or not the action will require user confirmation before being performed.

Then we define the parameters that the intent has to send us, which can be several and we have to say of what type, it can even be a collection of multiple elements not just one by a parameter. After this we define the shortcuts, telling the combination of parameters that we can receive and giving a description of how it is built: what the conjunctions or verbs will be and in what order for the system to distinguish orders or secondary words over the Parameters. Like, «ask for a car for my house.» «Ask» is the intention, «a car» is the parameter of what the user wants» and that «for» you have to put it in order for the system to understand that «my house» is the parameter that our app will receive before that call. The more combinations you have, the easier it is for the user to find a way to use our shortcuts.

-The more combinations or different ways to record activities, the better the system will understand what we do in the apps and the more accurate the suggestions of shortcuts.-

We follow and we must schedule the response that Siri will give, the one that will give in case of going well, going wrong and what properties or data will be used to return each response. And finally, we need to create an extension of the app that can be invoked by Siri to display information about our app without having to open it directly. It’s Apple’s most recommended way to get a good integration and be used comfortably on your iPhone or Apple Watch. In fact, if we don’t have the watchApp installed on the Apple Watch, the intent will still be available as part of the system and will execute the connectivity action in the iPhone app.

Completed all this, for Siri to offer the user our action (shortcut) what we must do in the app the intention every time the user does from this what we have configured as an intention for Siri. In this way, Siri will know not only what the user has done, but it has an associated action that it can offer the user to configure and use.

This case works very well with things that repeat over time, such as ordering us a coffee of a certain type every day at the same time. Let’s say I use my coffee app and order a Cappuccino with double sugar and an extra shot of coffee, and the system has registered an order intent that records the coffee, the type of coffee and the options thereof (double sugar and the extra shot of coffee). When this option is repeated all or several days at the same time, Siri will suggest that you make a shortcut. This way, you can create a phrase: «I want my coffee.» This will cause when I call «Hey Siri, I want my coffee» the system will tell the coffee app to order a Capuchin as you usually ask. It’s that simple. In fact, the app may suggest a phrase for using that or another phrase that you choose.

The intentions will serve so that the system can inform the user of exactly what they have done. That’s the key. The intentions here are not (yet) ways to invoke Siri: it’s how apps will inform the system more accurately and understandably what they do, in order to be able to invoke specific actions on apps, but in the background : without having to open the app and do it.

That’s why in the suggestions we will see: «Coffee Capuchin with double sugar and extra shot of coffee», which is just the action that we have associated with the shortcut. And in doing so, the intention will ask us for confirmation of whether we really want to ask for it or not and will give us a response of the type «Your coffee has already been ordered». That’s the idea. You can even create personalized responses that give more information like «Your personalized Capucchino coffee has been ordered».

Finally, an interesting option is that you can create the shortcuts from the apps themselves if they are offered by the developers, through an action button that will invoke the shortcuts interface and allow you to create the relevant action. This obviously makes the task a lot easier.

The Shortcuts app (Shortcuts)

We’ve seen two possible flows: associating a shortcut to a voice action to open the app on a specific site and intentions. Both are created in the system itself and we can activate them in betas from the iOS options in the Siri section. The problem is that intentions are now only used by system apps, but third-party apps will start using them when they are updated in September. But the last way you have to wait for when iOS 12 appears (and in later betas) when we have the new app called Shortcuts. An app that will be a Workflow but coupled with the use of Siri commands.

In this way, using this app, we will have access not only to pre-programmed options that we can use and record as Siri actions or shortcuts, but we will also have all the suggestions that the system recommends based on the use that we make of our apps and how they record our activity through the aforementioned NSUserActivity library.

It will also offer the intentions and then a series of tasks that we can define, as we would with a manager like IFTTT or Workflow itself. As this app is currently not available for testing, we’ll tell you when it appears more about it.