Google Open Source AI- Google has open-sourced their AI model for converting sequences of natural language instructions to actions in a mobile device UI. The model is based on the Transformer deep-learning architecture and achieves 70% accuracy on a new benchmark dataset created for the project.
A team of scientists from Google Research published a paper describing the model at the recent Association for Computational Linguistics (ACL) conference. The goal of the project is to help develop natural-language interfaces for mobile device users who are visually impaired or who temporarily need a “hands-free” mode. The system uses two Transformer models in sequence: the first to convert natural-language instructions to a series of “action phrases,” and the second to “ground” the action phrases by matching them with on-screen UI objects. As research scientist Yang Li wrote in a blog post describing the project,
This work lays the technical foundation for task automation on mobile devices that would alleviate the need to maneuver through UI details, which may be especially valuable for users who are visually or situationally impaired
The Transformer is a deep-learning architecture for mapping input sequences to output sequences developed by Google in 2017. It has several advantages over other sequence-learning architectures, such recurrent neural networks (RNN), including more stability in training and faster inference; consequently, most state-of-the-art natural-language processing (NLP) systems are Transformer-based. The key operation in a Transformer is attention, which learns relationships between different parts of the input and output sequences. For example, in a Transformer trained to translate from one language to another, attention often learns the mapping of words in the source language to words in the target language.