“A fully intelligent machine will be built in less than 20 years.” This was the promise of artificial intelligence (AI) in the 1960s, long before it became ubiquitous on Medium blog posts and industry publications. The creators behind Siri, Cortana, Alexa, and more recently, Viv claim that we are tantalizingly close to that future, but as this quote suggests, the history of AI is a never ending story of big promises that continually failed to deliver. These virtual personal assistants have been fun to test, but it is painfully clear to anyone who has tried them that AI still has a long way to go in understanding human language.

Advances in AI are often heralded as another step closer to achieving human levels of understanding, but consider this: do we really need to show thousands of pictures of cats explain to someone else what they are? If our pets require even a hundred examples before they learn a new trick, would we ever consider them intelligent? When deep learning is applied to understanding human language, it is far from intelligent, requiring millions of labeled data points to become even remotely usable. Here are some fundamental reasons why:

A Brief Artificial Intelligence Primer

Every AI system that aims to deliver actionable intelligence needs to be taught what matters. There is no magic. The only way to tell a deep learning system what matters is by creating a labeled training dataset. In other words, we train it by showing examples. A lot-- thousands, even millions-- of examples. And each and every example must be labeled correctly by a human or a human-guided machine for the training to be effective. This training takes time and massive amounts of data and computation, requiring current AI solutions to take months, if not years, to hone. And even if we take away those roadblocks by assuming infinite computational power and datasets, all current AI systems, including deep learning, remain constrained by the need for classification that is fundamental to its performance.

Classification is just a fancy name for sorting: when we build these AI algorithms, we predefine a set of all possible categories beforehand and train the system to predict a category of each new input. It is little different from training a monkey to put spheres or cubes into corresponding holes. Because the categories are always predefined, we need to know exactly what we are looking for from the beginning of our analyses. The only information we can get from this process is a statistical view of known phenomena, e.g. among X comments, Y were positive and Z were negative. If we start with classification, we cannot discover anything that we were not already looking for. We can never truly learn.

Deep Learning Chokes on Human Language

Many domains have leveraged deep learning to make significant progress within the confines of these fundamental limitations, but deep learning’s limitations are crippling when it is applied to human language. Deep learning has greatly advanced the fields of image, video, and audio processing: inputs in these formats are simple, mere pixels and audio waveforms. Human language, however, is much more complex. Deep learning fatally treats human language as unstructured data, ignoring the underlying structure that is more flexible than any classification system can build. That structure enables us to learn new concepts and re-define evolving ideas as the environment around us changes-- we know it as syntax.

Human language is powerful and flexible because is symbolic. Words and other symbols have a meaning only because it was given to them by those who use language. We have already assigned countless meanings to millions of words and phrases, and this process never stops. A few decades ago, an ordinary person had no idea what "text messaging" “ or “googling” means. Luckily, syntax of language gives us the ability to assemble the symbols in order to communicate and invent or learn new ones. Even my grandmother can figure out what “texting” means, given some context-- she does not need to see even a dozen examples to get the point.

Current deep learning approaches to processing human language will be severely self-limiting if researchers continue to apply the same approaches that succeed in signal processing to processing language. By using context instead of predefining concepts and entities, we free ourselves to learn.

Gaining Agility with Context

Instead of a top-down, classification confined approach to processing human language, let’s consider a bottom-up approach that would not require the predefinition of any concept or entity. To better understand how our team at fido.ai is approaching this idea, think back to the “fill-in-the-gap” exercises from elementary school: given a sentence such as “I’m afraid of [BLANK],” how would you fill in the gap? People might be afraid of thousands of different things. Predefining all of them beforehand is a ridiculous and impossible task. However, because of the context, everyone would easily recognize that “[BLANK]” stands for a source of fear. The same applies to every aspect of human communication. Certain experiences and intents can be expressed in a finite number of syntactic and semantic constructions.

This is the simple idea behind context-based information extraction-- a bottom-up approach, free from the rigid confines limiting classification and concept-based information extraction. Here, the machine acts like a human, and teaching such a system takes hours or days, not months or years.

There is no doubt that deep learning is a very powerful tool. Nonetheless, it is important to remember that deep learning is still just that-- a tool. As you don’t use a hammer to drive screws, deep learning will not be an end-all, be-all solution.

With special thanks to Ivy Nguyen of NewGen Capital.