Retail AI-When people say “AI software”, what do they really mean? This is a critically important question to understanding the impact that AI can have on the retail industry (on any industry, really), and a significant source of the gap that exists between AI hype and reality.
While McKinsey focuses on AI types of classification, prediction, and generation, I’ve found it more useful to look at natural language processing, computer vision, and prediction. But each of these are umbrella terms for lots of what amount to “micro-capabilities”, which is an important limitation when thinking about AI. Artificial General Intelligence (AGI, but sometimes referred to as Human-Level Artificial Intelligence) is at least 20 years off, according to experts in the field, and possibly 50 years. And both ends of the estimate acknowledge that some kind of breakthrough that we have not currently achieved would have to happen before we can get there – there is no current known path to achieving AGI.
What Kind of AI Are You Talking About?
So when the retail industry talks about AI, the first question you have to ask is, “what kind of AI do you mean?”. Baidu, one of the Chinese leaders in AI capabilities, last year released over 100 use cases available through its Baidu Brain platform. When you go through the list (be prepared to spend some time with Google Translate if you don’t read Chinese), it becomes immediately clear that the use cases available are extremely “small” in their capability. For example, under the umbrella of Natural Language Processing (Baidu does not really classify its use cases, so these are my attempts at categorization): speech recognition, speech synthesis, voice wake-up, text recognition, advertising detection, business card identification, passport identification, license plate recognition, form text recognition, lexical analysis, word similarity, text correction, emotional tendency analysis, conversational emotion recognition, article classification, universal translation API, voice translation API, search analysis, and intelligent call center, among many others.
Baidu doesn’t specify a definition for each of these capabilities, so I’m going to have to make some guesses based on the names, but to me, “intelligent call center” is the most advanced of the NLP capabilities in this list. Most likely, it is a combination of multiple of the capabilities listed, like speech recognition, speech synthesis, conversational emotion recognition, and lexical analysis. And each of these are most likely to be independent capabilities that have to be combined, rather than one single engine that is capable of all of these things at once. An AI that could, through one set of algorithms, learn how to detect what a person is saying and how they’re saying it, assign a context to it, and then marshal an appropriate response, would be extremely close to an AGI, and per above, we’re a long way away from that.