Natural Language AI: Architectural Considerations

If there is one area of AI that was closely watched by practitioners and and researchers in 2018 and as of early 2019, it was the natural language processing space. Innovations in sequence modeling deep neural networks, ranging from bidirectional LSTM networks, to Google’s BERT and Microsoft’s MT-DNN have improved capabilities such as language translation in a significant way. There are many more advancements in the field of deep learning which have been very well summarized by MIT Researcher and Professor Lex Fridman in the below talk.

The State of the Art

Lex Fridman takes us through many deep learning developments in 2018, including BERT

Given the presence of mature and increasingly sophisticated models of language translation, and improvements in language understanding, what many human-machine interface development teams may be looking at, to leverage these capabilities, is the right kind of architecture for enabling this capability. After all, it is only when these algorithms reach the customer in an actual translation or language understanding task, that their value is realized.

It is evident from the MT-DNN paper by Microsoft Research that some core elements of the natural language processing tasks won’t change. For instance, look at the architecture diagram of the MT-DNN (Multi-tasking Deep Neural Network) below.

MT-DNN Architecture from Microsoft’s Research Paper

The feature vector X still has to be taken through all shared layers in any sentence / phrase based interaction, leading to the context embedding vectors we see as l2. It goes without saying that when we have such a shared architecture which provides the underlying capabilities for transformation, representations and word encoding, the subsequent deeper layers of the network can become more specialized, be this pairwise similarity, classification or other use cases.

Similar Paradigms

The surprising thing is that this isn’t a new capability. It is rather analogous to the higher level representations learned by face recognition deep learning networks, or the higher order patterns learned by deep LSTM sequence classifiers.

Image result for deep learning face recognition layers
Face recognition DNNs and the features they learn (via presentation here on Slideshare, by Igor Trajkovski)

One of the trends anticipated by Andrew Ng and other Deep Learning researchers some years ago is the arrival of end-to-end deep learning systems. In the past, there would have been a need for specific components across data pre-processing, feature engineering, machine learning or optimization, and perhaps a compositing layer which encompasses all these elements. This component-wise architecture can, given enough data, be replaced utterly by a deep learning network. Falling back on the mathematics behind the possibility of deep learning networks as universal function approximators (Hahn-Banach theorem et al, as shown here) provides another justification for such an end-to-end architecture for deep learning centric systems.

Natural language centric AI systems are, by definition, customer-centric. There are few use cases for systems deep inside the woods of business processes that require this capability, and because of this context, such AI systems have to provide for online learning and management of concept drift. Concept drift management is no easy task, and active research continues to happen in the space ( one example is here ). Concept drift verily informs capabilities such as online learning, and although brute force methods exist for reiterating large scale training, there’s only so far that can go before a smarter approach is sought out.

Architectural Considerations

Some architectural considerations for such end-to-end natural language centric AI applications therefore could be:

Four architectural considerations for Natural Language centric AI systems

Harmonization of data generation processes calls for unified user interfaces, or sub-layers in the user interfaces, which translate the end user’s intent. The manifestation of this intent may be different in different cases, depending on whether the interface is speech based, vision based or gesture based, for instance. However, intent inference and translation to a natural language paradigm could be a key capability, which enables a certain kind of taught interaction to AI systems.

We have seen already how common representational methods of input data can be a massive advantage for building numerous specializations on top of what was already available as a core capability. Modularity therefore becomes more important. In the presence of a common representational standard for input data, building specialized networks can become more straightforward, since a number of constraints begin to manifest themselves in any AI development life cycle. Finally, concept drift and its management become important considerations for the last-mile of the AI value delivery, at deployment time.


It should be realized that the modern translation algorithms such as BERT and MT-DNN provide very advanced capabilities which can enable natural language interactions in a manner never before imagined, and as we see intelligent systems leverage these algorithms at large scale, we will probably also see the above architectural considerations of input harmonization, common representation, specialization + modularity and online learning become infused into the architecture of common AI systems.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s