The Value of “Small Data” in ML and AI

This is a comment from LinkedIn.

I wish we paid more attention to “small data”. Models that are built from small data aren’t necessarily bad – it depends on the data generating process you’re trying to model. More data doesn’t necessarily imply better models, especially if the veracity of the data is questionable. Data-centric AI is a discussion that’s being had now in this context. However, when you don’t need large scale ML models are are (prudently) content building statistical tests and simple models, these small data problems become important.

What decision makers shouldn’t forget is that the essential nature of decision making won’t change just due to the size of the data – ultimately it is the insight that models provide (based on many factors) that are the commodity we consume as decision makers. Consequently there should not be an aversion towards “small data” problems but a healthy curiosity. Like all efficiency movements that came before, small data paradigms are innately attractive – if I can verifiably build better models by doing less work, that should logically be a point of value.

MLOps Capabilities, Outcomes and Opportunities for Enterprise AI

Machine Learning Operations (MLOps) has come to be an important push for enterprises in 2021 and beyond – and there are clear reasons why this paradigm shift in Enterprise AI is upon us. Most enterprises who have begun data science and machine learning programs over the last several years have had difficulties putting even their promising machine learning models and proof of concept exercises into action, by deploying them meaningfully in production environments. I use the term “meaningfully” here, because the nuances around deployment make all the difference and form the soul of the subject matter around MLOps. In this post, I wish to discuss what ails enterprise AI today, sources of the gaps between production and proof-of-concept, expectations from MLOps implementations and the current state of the discourse on MLOps.

Note and Acknowledgement: I have also discussed several ideas and patterns I've seen from experiences I've had in the industry, not necessarily in one company or job, but going back all the way to projects and programs I've been in over the last seven to ten years. I don't mention clients or employers here as a matter of principle, but I would like to acknowledge mentors and clients for their time and energy and occasionally their guidance as well, in the synthesis of some of these ideas. It is a more boundaryless world than before, and great conversations are to be had regardless of one's location. I find a lot of the content and conversations regarding data science on Twitter and LinkedIn quite illuminating - and together with work and clients, the twain have constituted a great environment in which to discuss and develop ideas. 

What ails Enterprise AI today?

Surely, with the large scale data pipelines companies have access to, the low cost of cloud native solutions, and the high level frameworks for building machine learning models, things should have become easier? Enterprises still seem to be failing in their efforts to build AI programs for many reasons despite these upsides. For one thing, building models has become easier than before. It takes less time to take (good enough or clean enough) data and build prototype models with this data. Regardless of how many hypotheses you have as a business leader or data scientist, you’re more likely in 2021 to be able to collect data and build prototype models with this data, than you were able to in previous years. In the past, you may have had to go through several organizational hoops to get your data, and then prepare this data and then build models. All of these processes have become a bit simpler in 2021, thanks to enterprise data stores maturing, frameworks for building ML models become better known, and greater numbers of data scientists being available to build models. While things are still quite complex for the uninitiated, those on the growth curve in data science have found this phase to be adding productivity to their prototyping efforts.

What hasn’t changed, though, is the process of taking these models to production. The model is largely seen as a software asset, and productionization of the model has been seen in this limited context. As we will discuss, it is important to challenge this mindset if we’re to build effective machine learning systems for production. The gap, therefore, between proof-of-concept models like we’ve discussed above, and production scale implementations of such models, is large. Real world implementations are more complex and tedious, and often, the hypotheses we want to build models for are a bit more well defined – this necessitates extensive data processing, profiling and monitoring. But the complexity doesn’t end there, even though there has been an effort on the part of MLOps practitioners to build end-to-end pipelines. You’ll note that none of these are ground-breaking realizations. MLOps is a practical field, thus far, intended to make all these models work for enterprises – but as we will see below, the practical nature of this field encompasses a number of domain, statistical, cultural, architectural and other considerations.

I wish to suggest before diving deeper into this post, that this trend towards MLOps adoption represents a noteworthy change in how enterprises see ML system architecture in 2021, as opposed to the previous decade. In a manner of thinking, represents a move towards the “plateau of productivity” in enterprise machine learning.

Considerations for Enterprise AI – from MLOps, Data Science and Data Engineering

Domain Considerations Matter in Data Science, Data Engineering and MLOps

I wrote several years ago on this blog that domain knowledge is an important element in doing data science. Back then, as a data science neophyte learning from early experience in pure data science roles, I had made several observations about the impact of domain understanding on how quickly we can arrive at hypotheses for formulating data/AI problems. Looking back, this was an important lesson, because I now acknowledge the importance of domain knowledge every time I work on a data science project, or each time that I enable a data science team to be successful. Whether this is my own domain knowledge or that of SMEs, I am grateful for it, because without it, we could build anything, and it wouldn’t ultimately matter to anyone. Domain knowledge gives purpose to data and AI efforts. Without speaking to the domain experts and SMEs in various projects (finance, manufacturing, retail, energy and other industries), there would be little to no chance of timely and cost-effective success in characterising, ideating about and solving these problems.

It may not be immediately evident, however, that domain considerations matter in MLOps (and DataOps). Without an understanding of data generating processes, data formats, sources, rates, types, and data organization patterns, data fields, tables and even some of the process characteristics, we cannot understand data generation or transformation processes in enterprise data pipelines. We can also not understand how models are to be implemented, and what deployment means in different enterprise or customer contexts. When building and architecting machine learning systems, we end up needing to discover these details if we haven’t already. MLOps therefore cannot be ideated about in a vacuum, without consideration to the domain of the problem, or without consideration to the unique challenges of deploying models that domain. MLOps in logistics and supply chain problems, therefore, will be quite different from MLOps in manufacturing, retail or banking domains.

For instance, if we were building a classification model to sort defective parts from good ones on a manufacturing shop floor, we may need a real-time deployment system, with consideration to latency, edge based deployments of models, opportunities to inspect models as downstream processes or metrics may indicate process failure modes, and so forth. These considerations may not exist if we were building a system for enhancing ad revenue in a platform software company. The considerations there around uplift from pushing ads to new customers may require edge based deployments of a different kind, or federated learning needs, that may be unnecessary in the manufacturing example we discussed. To use an analogy, deployments are like different flavours of ice-cream, each requiring a different kind of appreciation. A failure to realize this may lead to difficulties in enterprises that may inadvertently underestimate the complexity of MLOps, of their own domain processes, or both.

Simplistic, Linear Pipelines Don’t Get Us Over the Line

The current thinking around MLOps is somewhat simplistic and linear, and I mean this in a specific way. There is a lot of discussion around data workflows and pipelines, metadata generation and management, and the metrics around model training and model performance. These are discussions around the management, transformation and profiling of data. Datasets are important to MLOps pipelines, and inasmuch as agility in data science is concerned, I’d even say that they are primary.

However, this notion of thinking only about the software and application-level implications of models and their deployment doesn’t address some of the needs from MLOps pipelines for enterprises. Notably, model interpretability and explainability, managing a diversity of deployment patterns (edge, batch, real-time or near-real-time), and the need to build repeatable pipelines or reproduce results. These problems cannot be broken down into just software applications, and require statistical rigour and attention to changing domain patterns. In fact, there is sometimes a desire on the part of ML engineering or MLOps practitioners to see these more statistical needs of MLOps as “not software engineering” and perhaps therefore “not easy to build for” – both of which may not be true, especially as the space of tools and implementations of statistical models for interpretability/explainability expands just as ML implementations have expanded.

Imagine that you have built an MLOps pipeline to build a dataset for a specific use case, and deployed it and the model eventually, and all’s well. If there’s a need for a new use case, you’re likely to begin back at square one, and build new pipelines, especially if you don’t have a clear and unified data model. As we will discuss in a later section on architecture, this is important to consider in ML engineering – more than one use case may require your data pipeline. This also means that simplistic and linear pipelines can only serve a limited purpose when you’re required to build many such pipelines across enterprise workloads.

For instance, it is possible to build SHAP scores for models given a specific dataset, and for companies with regulatory needs, there may be a reason to deeply analyze and publish results such as these. Therefore, MLOps shouldn’t only be about building simplistic DAGs or workflows in your YAML engineering tool of choice, or building and deploying metadata-tracked machine learning training/inference workflows. These are necessary, but insufficient for good MLOps implementations – chiefly because there are many other statistical and probabilistic considerations around MLOps which also deserve attention.

Data Architecture Before MLOps, but Business Needs First

There was an interesting discussion here recently around the theme of “Data before models, but problem formulation first”. The interesting article in question describes the specific challenges of thinking about data science problems based on business problems, and being “data-driven” in thinking about and building models for our hypotheses. I posit that a similar paradigm applies to MLOps. Data architecture understandably matters a great deal for success MLOps implementations, because it encompasses very foundational organizational processes and needs around data collection, storage and management, governance, security and quality, access patterns, ETL/ELT, sandboxes for analytics, connections to BI and reporting systems, and so on. Ultimately, this complex web of processes and technologies (because data architecture is more than just storing and retrieving data) is meant to perform some function of the business. As W. Edwards Deming said, “Data are not collected for museum purposes” – they are collected for a decision to be made, or for some end use. In the world of MLOps, we enable such decisions to take place on top of the data provided to us through an enterprise data architecture such as this one described above.

While typical enterprise data architectures are driven by the capabilities of tools and cloud scale applications more and more (because of the economies of scale of cloud providers, and the low barriers to entry), there is an important set of decisions every enterprise data architect has to answer for, around the specific needs of the organization, and how the architecture in question enables that to happen. Seemingly trivial decisions taken at the design phase of a data lake or data warehouse can have long lasting implications for the delivery of value from analytics, machine learning and MLOps. Data architecture is certainly important for MLOps, but the more fundamental needs of the organization – the kind of data required, the strategic importance of it, the decisions that need to be made across use cases, security and access patterns for data analysis and data science, and many more operational aspects of data – all of these are important and have a bearing on MLOps effectiveness too. So if you’re a data scientist or MLOps practitioner looking to improve your impact and effectiveness in solving problems, understand the underlying data architecture more deeply first. Sometimes, doing this can be hard – especially if there are no stakeholders who can explain it well – but this kind of fundamental understanding and context are highly underrated and have an outsize impact on the success of data science and ML programs eventually.

The Enterprise Model Sanctuary: Many Simpler Models, A Few Complex Models, and Other Combinations

A cursory glance at machine learning and MLOps forums, discussions and content indicate that the thinking around model development techniques is method centric, and not business centric. A large number of the discussions are a consequence of what’s required for companies at scale innovating on a few complex models with huge amounts of data – and these are legitimate and interesting discussions for sure. For example, most MLOps discussions I have come across seem to discuss the deployment of deep learning models. They discuss text and unstructured data processing, and complex image processing pipelines. Whether the use of tools like Kubeflow for training and deploying models in a distributed fashion, or the use of MLFlow for tracking metrics and performance, these are all legitimate considerations that may solve subsets of the ML deployment space. However, machine learning state-of-the-art is rarely required for enterprises looking to get value out of their specific use cases. The large majority of use cases in the industry are for simpler models, though and this is why simpler pipelines could do a large part of the value creation. I say this from experience and with confidence, having seen numerous projects where managers struggle to make sense of ML outcomes for their business, but have less difficulty making sense of data aggregations, summaries and statistics based on the data. The enterprise model ecosystem is more likely to resemble a zoo or even more accurately a sanctuary of different models, where each model may have its own specific needs and requirements.

Model development in mature organizations generally is an afterthought to carefully evaluating data and the evidential findings from it on merit, and then exploring hypotheses subsequently. Enterprises at lower levels of maturity have difficulty getting value from such an approach, however, and many leaders there may still rely on dashboards and reports. Clearly, there is an important and untapped market in business intelligence from big data. There is also a huge market for implementing simpler models based on clearly defined hypotheses. In many cases, enterprises may need many such simpler models, one for each stratified part of a specific use case. For instance, if you’re a market research firm estimating sales in a market segment, you may wish to build many such models for each sub-segment. If you are an equipment manufacturer doing quality checks using machine learning models, you may wish to use attribute based classification models, one for each product line, and perhaps you want to build many of them. The true value of MLOps in these cases is not in managing the complexity of deployment for one complex model, but in enabling many simpler models to be taken to production quickly and efficiently. These simpler models may then provide a baseline with which to build more complex models as needed.

Machine Learning Systems are Stochastic, Not Deterministic

Perhaps I’m stating the obvious, but it needs to be said. The underlying nature of data generating processes and machine learning models is stochastic and not deterministic. Whether we’re talking about manufacturing process metrics, banking and finance transaction data, energy sector data around load, power, usage, and so forth – all of these data are generated from stochastic data generating processes, even if they come from engineered systems. Machine learning models are also never exact mathematical formulations – they are almost always stochastic processes. There is a little to unpack here, so I’ll get into a few instances. What this stochasticity means, is that machine learning models exhibit variability in results from situation to situation, and that this will be quite evident in production. In order to begin building machine learning systems, we need to perform exploratory data analysis prior to training time, prepare features for our hypothesis, check assumptions based on the feature and the model formulation, and then build models and evaluate them. What it also means, is that we need to build safeguards to ensure that these assumptions are valid when doing production scale inference. It means that we may have to reformulate problems, as the underlying conditions of the data generating process changes. In case of deep learning models, sophisticated tensor transformations and training loops are required as part of the normal training loop of deep learning models.

When the model is eventually trained to the required level of performance and rendered, they too represent a solution at a specific point in time. MLOps is not about “train once, deploy everywhere”, but about “routine retraining and redeployment”. This makes ModelOps and the continuous training lifecycle of model development as important a consideration in MLOps as DataOps is. A lot of discussion around MLOps today is centered around data preparation – and the motivation for this, of course, is the fact that there are significant data preparation challenges that data scientists face. However, model training in the real world cannot be wished away by despite the prevalence of AutoML, although AutoML tools are one path for progress. As of 2021, for most use cases, model definition and training is still done manually, even if tuning and optimizing the model are automated. In MLOps lingo, we are referring to the importance of using feature stores, and their impact on data drift and concept drift analysis. While a healthy discussion is in progress on these topics, the instrumentation in actual implementations of data drift and concept drift identification and measurement tends to vary. Some tool chains are ready for this change, and others just aren’t.

More broadly, some MLOps implementations may account for these stochastic and probabilistic characteristics of ML systems, because their data scientists ask the hard questions after training and during/before deployment. On the other hand, it is likely that most MLOps implementations today treat models merely as pieces of software. The latter pattern leads to the unfolding of technical debt of various kinds later in the lifecycle of the system. This technical debt currently represents building additional regulatory checks, doing interpretability analysis, meta-data logging, model performance metrics, and so on – and over time, this set of secondary considerations may grow much bigger.

Changing Skillsets and Roles for MLOps

Companies looking to hire top ML talent as of 2021 are pushing for a greater number of high quality data engineers with MLOps skill sets. This is in contrast to emphasis on data science hiring in the past. Hiring pipelines for data and AI roles (I’ve seen a few different ones over the last few years) tend to emphasize programming, statistics, databases and specific technologies for data science – of late, this is largely SQL, Python, with a smattering of distributed frameworks and tools, and skill sets in deep learning, tabular data analysis and the associated frameworks and tools for solving problems in this space. For data engineering roles, over the years I’ve seen skill requirements specifying systems programming and strongly typed languages such as Java and Scala, experience working on JVM languages, in addition to SQL, databases, and a lot of the back-end software engineering skill sets we see for application developers elsewhere. For data engineers working on big data technologies, there’s very often a need to be familiar with NoSQL databases, or graph databases, depending on the role and use cases, in addition to the Hadoop-and-friends ecosystem, and cloud engineering skills such as AWS or Azure. While the data scientist’s role and skill set has come to include domain considerations, advanced statistical and ML models, cloud-native and large scale data science and deep learning and communication/presentation of data and insights, the data engineer’s role has become broader around systems engineering and design.

Someone said (in fact, in this talk) that data engineers ought to build frameworks, and not pipelines – and this is a fair assessment of how to use this broad and useful skill set in data engineering. There has been a healthy discussion in various forums, talks and the like on ML engineering roles which combine elements of these two different skill sets. All of these conversations around skill sets are important context for where we’re heading in data science and engineering space overall too. MLOps, unlike DevOps before it, should not be constrained by the limited value addition possible outside of data scientist or data engineering roles (the bulk of DevOps roles are administrative in nature). They cannot be construed as or see themselves as configuration file engineers, for lack of a better term. In fact, their role could be much broader – as systems engineers spanning a range of capabilities in both data science and data engineering, while not possessing expertise in any one of these (themselves diverse) areas. MLOps roles should perhaps also emphasize domain knowledge or expertise of some kind – since ultimately, the outcomes here are practical and related to business value from ML. There are many outcomes and opportunities for talent and skill sets for sure, but these stand out as being relevant. What is for sure is that the data scientist’s role has changed (as has the data engineer’s), and the old and unyielding challenges being faced by data scientists are taking on new definitions and manifestations – thereby requiring new mindsets, new skill sets, and new processes to come forth.

In my view this churn in the extant data science and engineering role paradigm is a welcome development because enterprises first want to realize value from DataOps and MLOps simultaneously today. As we will discuss later in this post, while models are important, business managers will continue to derive value from analytics and reports – and perhaps there has never been a better time to build on that need than 2021. Also, the emphasis on data engineering roles as on date is well-founded. From practical experience as a data scientist who worked on a range of problems from relatively simple ML to complex deep learning models, I will happily acknowledge that data engineers I have worked with were indispensable to the success of the projects I succeeded on. However, leaders hiring for ML roles should not think that the role of the data scientist is no longer required. I believe this emphasis on data engineering is a passing trend as enterprises build foundational pieces that enable value from data. The focus will therefore shift once again to business value from data, and that this automatically means that statistical, data science and ML skills will continue to be in vogue through this shift and afterwards.

Don’t Ignore Decision-Making Culture

Organizational culture matters a lot for the success of MLOps, as much as it does for any digital transformation program. MLOps represents, in a way, a desirable end-state or the happy marriage of data science and data engineering in a given enterprise and data architecture context. However, both data science and engineering can only be valuable and effective in organizations whose leaders think about and talk about data and use the data and insights from these data for taking decisions. The latter is a cultural synthesis, and not just a technology adoption process or workflow that one can execute on demand. Being a cultural matter, it has to do with behavioural and attitudinal patterns that ultimately enable data and insights to be used for decision making.

The adoption of data driven decision making represents a shift from thinking about business processes, systems and decisions in terms of rules (“Rules are for lazy managers”, to paraphrase Simon Sinek), to an open-minded thought process around data and AI systems. When leaders stop thinking in terms of rules, and start thinking in terms of systems, they are often imagining situations of change, synthesis, formation and deformation of patterns, structures and interactions. They begin to see their role as an influencer more and as a commander less, and this shift in thinking can enable them to make subtle changes to their managerial approach, driven by data.

In the earlier post I wrote about OODA, and the AI-enabled generalist, there is a point I make about the decision making language of organizations. This kind of development of a decision making language requires a way of thinking about the enterprise’s systems, processes, and also the ML models in new ways. It requires an openness of mind in decision making to adopt models as thinking tools. In a sense, the modern AI-empowered generalist could be seen as a prototype for a supreme pragmatist. Enterprises want rational actors at their helm, at least for the functions that require data driven decision making – and such rational actors can be groomed in a culture that doesn’t shy away from challenging the current rules and norms on decision making, and is willing to look at data and models.

Data/AI Exponents as SMEs and Future Leaders

Organizations come to embrace data, ML or even MLOps so that they can ultimately derive value from data, and this cannot be done without talent that unlocks value from data. Be this talent data science talent or data engineering/architecture talent, there is both a topical / functional need and a strategic value of these roles in enterprises, and this tends to be overlooked in data strategy. This is because of the value such individuals accumulate over time, as they build data pipelines and AI/ML models, accumulating a lot of knowledge about business processes, customers and also domain knowledge in the process. When you have a data scientist in your team who has built a few different models that explain different elements of your business, processes or customer behaviour, they become invaluable assets for both developing further models, and for analyzing customer or business or process behaviour. Such individuals can also become effective leaders and transition to process management roles.

MLOps and DataOps engineers in an organization can therefore themselves be considered Data/AI SME roles – and this is an important source of value that is often overlooked in organizations. A lot of organizations still see data/AI resources as just means to an end, but in fact, many of these roles can become storehouses of domain knowledge. MLOps can potentially enable the tacit knowledge from such individuals to be effectively captured for process management as well – this may be an important opportunity for value creation from MLOps. MLOps can also accelerate the development of data-driven leadership talent. When exposed to the models used to take decisions, and the specific mechanisms of taking such decisions, leadership potential for process leadership is improved.

In an earlier post, I discussed the importance of higher-level decision making languages, the OODA decision making loop, and how AI can enable a new generation of generalists. I would suggest that this is a useful idea to consider in the broader context of building a data-driven decision making culture.

“Data Before Models” also implies “Models After Data”

The purpose of this heading is to draw attention to the fact that the best data pipelines won’t help, if we aren’t doing much with the data we prepare. We have to eventually build models with this data of one or other kind for actually taking decisions. Many recent discussions around MLOps talk about data-centric AI, and above, we have discussed data architecture and other elements of enterprise systems and culture that contribute to MLOps success. We have also discussed the stochastic nature of data generating processes and machine learning systems. There are important implications from the core ModelOps processes as well, and we will discuss them here, finally. The process of developing models, as I have discussed above, has become easier now than ever before, at least in software. The careful formulation and evaluation of model hypotheses, statistical analysis of the input data and features, and the checking of assumptions – these still remain harder, more tedious and less trivial, as they were before. This necessitates the importance of statistical analysis and exploratory data analysis. Without these foundational steps, ML models can be built with high bias or high variance, thereby setting up the use case for higher failure rates and lower effectiveness overall. This bears introspection and repetition, since there seem to be two schools of machine learning and data science professionals – there seems to be a group of professionals who believe strongly that mathematical and statistical thinking are important for doing data science. There’s another group of professionals and practitioners who think otherwise, that the software elements of data science modeling can be learnt by someone without knowledge of statistics or machine learning.

In my experience, the statistical analysis and EDA are fundamentally important for machine learning – they forms an integral and important part of extracting value from the data we have, and making sense of it, before we solve problems. A number of business situations require us to think in terms of data distributions and stochastic processes. To build things that scale within MLOps pipelines, some of us may need to have an open mind about exploring the mathematical underpinnings of things like gradient descent or batch normalization, or activation functions. This open-mindedness is important for a key reason – a lot of MLOps engineers being trained today may assume that the data science is easy, or trivial, because people who don’t know statistics are building models, or because they can, if they just follow a simple workflow. I know this to be patently untrue – if you want to develop a model worth anything in an enterprise, you may have to start from formulating and thinking about the business problem, get to the EDA and statistical analysis and built out tests for assumptions checking, and then experiment with different models. You have to get into the probability and statistical analysis eventually, or you will be forced to rediscover the effectiveness of these mathematical and scientific methods. Even if you manage to build one or a few models, there will be situations where you’re required to explain these models. Not only will ML engineers or data science engineers be more confident when they are able to reason about the mathematics of machine learning, but their ability to build and scale systems for the enterprise improves. Their ability to think about the implications of these models for different related use cases, for different deployment modes, different source data, and different data quality considerations also improves. By checking assumptions on the features, they could stave off big challenges that may arise when the model is implemented in production.

Statistical analysis and machine learning model development have been core and will be core to data science, regardless of the peripheral engineering required for realization of value. Data engineering and MLOps as allied fields help realize this value at enterprise scale. It is the process of data science and model development that ultimately converts data into insights – and insights are the primary purpose of investing in enterprise data and AI projects and programs in the first place. They will therefore continue to be a good bet for practitioners in future – as long as they realize that those skills alone cannot take them over the finish line.

Concluding Remarks

I hope that you’ve benefited by reading this rather lengthy post on MLOps and Enterprise AI. If anything, it allowed me to explore my own experiences, document a few patterns I see in the development of truly enterprise ready AI, MLOps toolchains and capabilities, and also explore sources of value from MLOps for enterprises. If you have questions or ideas, please leave a comment or tweet to me at @aiexplorations.

Further Reading/Listening

  1. Data Science is Different Now, by Vicki Boykis:
  2. Problem Formulation Comes First by Brian Kent on
  3. Build Frameworks, Not Pipelines – a Data Engineering Talk on PyData
  4. From Model-Centric to Data-Centric AI – a discussion on Enterprise scale AI with Andrew Ng and others
  5. ML Engineering for Production – another discussion on ML for production with Andrew Ng and others

Emphasizing the Basics: Structured Data Science Mentoring

Data science, machine learning and AI are constantly growing and burgeoning fields, with research that’s spilling over at the seams in terms of the sheer volume of it all. Every day, I receive numerous references to interesting papers on my Twitter feed, thanks to Arxiv daily and such accounts there. I also see papers explained with code, and references to ML products and systems in numerous contexts. This is all overwhelming beyond a point for a professional who doesn’t have a specific focus area. Speaking pragmatically, and from the tree of knowledge (which is always bound to be vast), it is a feature of every single human endeavour to exhibit this kind of complexity as we spend more and more time exploring things, farming ideas and understanding new possibilities in these areas.

Data scientists are going to be at different levels of competence and may be differently placed to take on challenges they are asked to face – the role of the mentor (regardless of the type) is to systematically challenge the data scientist to discover new innate potential and develop such potential to increase their overall capabilities and effectiveness.

The Data Science Mentoring Challenge

Mentoring can be a hard task for this reason – a lot of people (understandably) gravitate towards complex models that are meant for specific purposes, without fully understanding the details and the exact mechanisms behind simpler machine learning and statistical modeling methods. The problem with this is two fold – a) reliance on libraries and frameworks with implementations that already exist, and b) inability to characterize, apply and explain common and simpler techniques to actual real world problem statements. Part of the problem here is the sensationalization of research in the media. Open research without borders is important and pivotal for speedy progress in technology areas like ML. But we’re also seeing a lot of misinformation including sensationalization of advanced ML techniques and when some of gets parroted by professionals (some of who may become hiring managers) we see the problem proliferating into the world of work as well. I’ve interviewed my fair share of individuals who understand, say, an LSTM unit’s different gates but aren’t comfortable explaining autocorrelation techniques or ARMA models. This gap probably stems from gaps in mentoring and coaching, which ideally should emphasize basics first.

I’d posit that the role of the mentor has changed, in data science, over the last several years, and I would say it has changed most significantly in the last two years. In the future, data and AI mentoring will look different from what it looked like in the past five years. This is because the nature of the job of a data scientist (or alternatively an AI/ML engineer) has also changed. Despite developments in Automated Machine Learning, we’re inundated with situations in the real world, where we require human expertise to get through data science and machine learning problems. This human expertise manifests in three processes: problem characterisation, problem formulation, and problem solving. We need real, human data scientists (not just an AutoML tool) to look beyond the obvious automations such as hyperparameter or architectural searches, to reason about the nuts and bolts of problems, interpret the problem domain and reason about different kinds of hypotheses and how they make sense.

This makes the process of mentoring for data science different than it was, in certain specific ways. For one thing, mentors today create the field of problems or opportunities that will exist tomorrow. Data scientists today experience an overload of information as can be expected, from different sources. From Arxiv and Springer papers and articles, to new research and code, new books and new frameworks and algorithms, there are plenty of things to learn on a daily basis. However, the broader skill set of the data scientist even today can be characterized into four key areas: basic, business, functional and frontier skills.

Broad Characterization of Skills for Modern Data Science
  1. Technical skills: There’s the need for a strong foundation that enables general effectiveness in a data science role. This includes good skills across statistical analysis fundamentals, leading into the key principles that enable statistical learning models to be built, and a sound understanding of the mechanism behind common algorithms such as regression, tree algorithms, search and optimization methods
  2. Business skills: There is a strong need for data scientists who can reason about business processes and systems, and understand how data may be generated, how it may flow, and what insights may be required of it. Not only is this is a key skill to have fruitful interactions with clients and stakeholders, but it is also important to narrow down to the right level of depth for the job in terms of satisfaction and effectiveness.
  3. Functional skills: There’s the need for effectiveness on the job, at a functional level, which not only includes technical competence at the statistical, mathematical and code levels, but at the level of processes and good practices such as clean code, change management and reproducible research. One could also see more advanced machine learning and feature engineering techniques as being part of the functional skill set.
  4. Frontier skills: There’s research that’s expanding at multiple frontiers, which is hard for even experienced data scientists to keep up with, if they’re really interested in furthering their career beyond the obvious and evident challenges of day-to-day work.

Mentors: Different Levels

The role of mentorship has also become specialized in the last two years, which is, in my view, one of the changes most representative of the maturation of the field of data science. Mentors today can be at different levels of skill and still add value to different kinds of data science and analytics roles. For the sake of this discussion, I’d classify mentors today into two kinds – the “breadth mentor” and the “depth mentor”. While both kinds of mentors possess certain common skills, especially on the interpersonal communication front, they may have different approaches to technical, functional and research level mentoring.

The breadth mentor is an individual with plenty of experience in data science, perhaps in a consulting setting, that can provide generally correct advice to data scientists with the development of broad skill sets, ranging from basic statistical analysis, to advanced algorithms. The nature of the mentoring here is on developing a well-rounded data scientist, rather than an expert in a specific field.

The depth mentor by contrast, is someone who has deep experience in a specific area of industry or technology and has deep experience in bringing this field together with data science. Examples of this kind of data scientist would be an NLP researcher, or a researcher in the field of robotics, both of who may be expert practitioners of data science methods in their specific areas, but without the broader knowledge of consultative data science methods.

Depending on the needs of the business and the data scientists in question, the appropriate kind of mentor has to be chosen – and this shouldn’t be done lightly. For example, bringing a breadth mentor to an AI product firm may have some advantages, but if the firm is solving problems in a specific space, this may not work out so well. Similarly, bringing a depth mentor to a consulting firm can help grow a specific practice (or a new one) but may not benefit the broader data science efforts across different business domains there.

Structuring Mentorship in Data Science

Mentors (and hiring managers) in general should emphasize the importance of the basic skills listed above. In my view, when a data science candidate has the correct understanding of the essential basic statistical ideas and common algorithms, it becomes a lot easier for them to grasp more advanced ideas such as in deep learning, when this is required. Mentors can build better basic skills in data scientists by challenging their technical acumen.

Mentors should also emphasize business skills where relevant, and where the emphasis is on research, they should emphasize some of the frontier skills as well. Mentors in this context are expected to challenge the data scientist with relevant questions, and encourage a habit of systematically breaking down problems and asking the right questions. These business skills are important all the way up to solution architect roles and management, when crucial decisions have to be taken and hard questions will need to be asked often. Mentors can build better business skills in data scientists by challenging their problem understanding and characterisation.

Functional skills are important for effectiveness on the job. It is not okay for data scientists to theoretically understand a specific subject area, only to find themselves handicapped when asked to build a machine learning pipeline. Therefore functional skill mentoring is about challenging the data scientist on problem solving effectiveness.

Finally, frontier skills development depends on both the organizational or research context, and the data scientist’s interests. Mentors can provide helpful markers to enable the exploration of ideas, while emphasizing value from the research, and asking questions that keeps the data science researcher on track. The challenge the mentor can pose here is differentiated solution value and originality.

The Importance of Emphasizing the Basics

This brings me to the importance of emphasizing the basics. I see numerous individuals out there who are getting into data science and machine learning that are interested in getting right to the latest and greatest algorithms. For a while – and this has been a trend on LinkedIn and Twitter – budding data science aspirants post some of their work, where it involves the development of simple scripts or programs around computer vision, translation and such problem statements, thereby delivering an impression to a lot of their audience, that not only are they skilled at those techniques demonstrated, but that they are skilled at different kinds of data science problem statements as well. My own suggestion to data science aspirants is that they will be under pressure to demonstrate some of their more involved skills, not merely the ability to use pre-built libraries to solve problems using one’s own basic skill sets in statistical learning, but, perhaps be able to build such algorithms and systems from scratch. This kind of deeper skill is what differentiates the wheat from the chaff in data science.

Concluding Remarks

In conclusion, I believe the mentor’s role in data science has changed – mentors today have their tasks cut out, when it comes to building deeper skill in their data scientists – they should emphasise technical acumen first and foremost, problem understanding and characterisation next, and problem solving effectiveness after this. This builds up a well-layered skill-set where technical skills can perform a harmonious dance in amalgam, resulting in true value to the data science market.

The New Post-Covid19 World Order: Remote Work, Data, Cloud, AI, IoT, Governance, Autarky and Relevance

I’m almost certain that those of us reading this blog post have already experienced some of the disruption due to Covid-19 that’s been experienced at a huge scale across the world. The crisis that the world finds itself in as of this writing in April 2020, has brewed for almost six months. Going by recent research, the first cases of Covid19 were identified in Wuhan, China, around 17th November 2019. It has been a long and gruesome six months, marked by global disruption, tens of thousands of dead as on date, and more than a million and a half infected around the world. Most of us woke up each day of these few months to hear of more and more people affected and dead due to Covid19 in countries across the world. Some of us were not as fortunate – having experienced the disease or its effects first hand. In this post, I want to imagine what the world may look like – to use that oft-used expression these days – “when this is all over”.

Covid19: Health and Economic Impact

Not armed with a ready reckoner treatment of any kind because of the virus’ novelty, valuable time was lost in the initial weeks and the spread could not be curbed in Wuhan. The Chinese authorities in Wuhan as well as the WHO have both been blamed for the predicament the world finds itself in today – and perhaps rightly, although other governments of countries with significant numbers of cases are also to blame for their management of what was clearly known to be a highly infectious disease. There are experimental drugs and antivirals being tested, and a large number of people have recovered from such treatments – as of this writing, more than 400,000 people have recovered from this disease. However, the impact of this virus is likely to last a long time. It has been seen as a definitive point in history, marking the beginning of a new kind of social, economic and political world order, because the virus has far reaching consequences.

Annotation 2020-04-12 002730


The US seems particularly badly hit as of this writing, with European countries such as Italy, Spain and the UK also badly affected. Some countries have had more luck than others in fighting Covid-19, South Korea being one example. In Asia, we’ve seen Iran affected quite badly by the disease, with tens of thousands of cases and thousands of deaths. China’s been reporting minuscule numbers since late February, and while we’re all led to believe that they’ve defeated the virus’ spread, in good faith, the numbers they’ve told the world earlier didn’t add up, with additional research on estimated actual death and case counts, from Covid-19 in China. In South Asia, we’re likely to see a rapid growth in cases, and I hope that in India we will manage to keep the infection and death rates down as far as possible. As of this writing India has more than 8000 confirmed cases, and has seen more than 200 deaths because of the novel coronavirus. In summary, it can be said that this was a bolt from the blue for the world, not only in terms of the impact on the health and medical systems around the world, but also economically.

Staring Down the Barrel of a Global Recession

In the first week of April, millions of people in America reported unemployment, a historic high of 6 million or so in that country. Chinese firms are going back to work in and around Wuhan after the lock down in that country was lifted. I am unsure what the future holds there – if some of the test accuracy rates from Chinese test kits are as low as claimed in some of the reports (30-40%), we are likely to see a relapse of the condition in many of those affected – and without practicing the social distancing and lock downs protocols that seem to be required to curtail the spread of this disease, we may see a resurgence in cases in China as we are seeing elsewhere. This can only be a bad thing for the world’s current economic condition. In fact, as of today, it has been declared that we’re in a global recession.

Politicians, policy makers and companies the world over have been pulled up for not acting fast enough, with even the WHO not being spared – their initial advice on not recommending masks is now widely seen as a problematic piece of advice which led to untold misery in countries like Italy and now in the US, because the advice contradicted the correct practice for curtailing the virus’ spread. Economically speaking, most economists and economic policy makers have indicated that the world economy is already in recession as of 11th April 2020, and that we’ve seen a significant erosion of value in all economies of world with the possible exception of China and India who may recover from this recession better than most. As a services oriented economy with only a satisfactory manufacturing base that’s underwhelming compared to the scale of manufacturing in China, it is hard to imagine India bouncing back strongly from this shock. China controls a lot of the supply chains of global manufacturers in a diverse variety of goods, and therefore have the potential to bounce back better than India on that count. India’s tech-savvy IT businesses and startups may buck the trend and do well, but sectors like agriculture and manufacturing will suffer because of the supply chains everywhere being hit. Even within Indian IT, though, demand is probably going to be hard to come by, and we may see very bad tidings for the Indian economy in general.

In India, where we’re seeing a large number of cases (nearly 7000 cases as of this writing, and almost 250 deaths), the lock down has resulted in a huge disruption, especially affecting the non-salaried class. India’s economy has a large unorganized sector, where artisan-ship, daily wage labor and other such occupations account for a large percentage of the workforce. These jobs are responsible for a vast amount of India’s economic leverage as well as provide a viable income for the masses that don’t possess advanced degrees and skill sets that go beyond the basic skills required for most jobs. With Covid-19 requiring social distancing, many who form part of the unskilled workforce may end up contributing to the supply chains that will run our socially distanced, remote workforce. Without this option, they’re likely to be significantly set back, economically. Already, we see how companies like BigBasket, Swiggy, Dunzo and other delivery-centric and supply-chain-centric firms (in India, and their equivalents elsewhere) are doing really well in this period of crisis and adding significant value to their customer base. We’ve also see how telecommunications firms have seen their value expand as a consequence of their increased demand at this time of crisis. And this is important in the long term for reasons that I will explain below.

Modes and Impact of Remote Work: Technology Sector and Other Sectors

With Covid-19 pushing organizations to follow lock down protocols and social distancing measures, plenty of Indian organizations (like elsewhere) have adopted remote work as a viable alternative to office-based employment. The important thing about this trend, however, isn’t the fact that the organizations have adopted the few changes necessary to enable remote work – it is the fact that the very nature of these organizations will change, thanks to Covid-19, even after this crisis has been relegated to the history books. Why is this the case? For one thing, operations managers and COOs will realize that remote work enables higher productivity and lower costs for knowledge work. They will understand the benefits of having employees manage their time at home, juggling responsibilities at home and work, while completing tasks and meetings required for achieving their goals. Remote working also obviates the need for large offices. The modern glass-paned concrete-jungle city is a consequence of an old school of thought – centralized, synchronized teams, communicating face-to-face and using this face time to build relationships.

Now, in the post-Covid-19 world, companies will have to amend their cultures and working styles to perform all of these functions – from sourcing to the delivery of value, to the collaboration required for sustainable organizations – completely online. This necessitates the use of telecommunications networks first and foremost, and on top of these networks, it necessitates the use of communications and collaboration technologies from audio and video conferencing and the like. As an example, if you’re a software developer, you may start your day with video calls, manage your features and tickets on a tool like JIRA, rely on documentation asynchronously developed by a global, distributed team, and manage your code on git with good engineering and code management practices. If you’re a manager, expect to jump on a number of video conferencing calls, and expect to build relationships remotely. If you’re an executive, you will have to cultivate the ability to write, inspire and influence your stakeholders and team across distances, with very little possibility of direct face-to-face interactions.

Manufacturing, energy sector and other organizations will rely on a combination of process automation for manually intensive tasks, implement a sanitized workplace for those who are required at site by default, and enable remote work options for knowledge workers in those industries so that they may collaborate and add value remotely. In such organizations that range in their scope from industrial equipment production to chemical and oil and gas supply, there is likely to be significant disruption of the standard work practices that were implemented and perfected over the years, because of the unique challenges faced by their employees and customers, in the post-Covid-19 world. Companies across industries will realize and come to value good measurement systems for processes at many layers of the enterprise (technical and process level metrics, and also functional and organizational level metrics), and strive to implement effective and reliable measurement and management systems, because their decisions will be asynchronous and decision making remote, and both processes will be based on such data. Metrics to manage and provide feedback on employee performance and reward or penalize such performance will have to take a different route from what was done in the time of face-to-face communication. Many organizations will have to adopt a process of management by metrics in addition to management by objectives, with old styles of direct and micromanagement going out of fashion.

Technologies such as augmented and virtual reality, which have become popular in recent years for things like product demos, entertainment, games, simulation and so forth, hold a great deal of promise for companies wanting to bring immersive collaborative experiences to their workforce. While a VR/AR meeting seems simplistic as a collaborative addition over the video conference experience, perhaps there are many opportunities for interaction possible in this ambit. Largely, the impersonal world of online conferences and meetings have seen an attention deficit problem, and low engagement. This seems to be true even for some video conference situations, where a lot of the interaction’s elements are voluntary. The subtle non-verbal cues that humans pick up and communicate with in face-to-face conversations play a big role in meetings and trust building, and this impacts credibility and consequently productivity. Naturally, face-to-face conversations set the bar for interpersonal communication far higher than virtual alernatives (text, audio, video and VR/AR), and there is a need for over-communication verbally and gestures-wise when you’re on calls of any nature. This has consequences for team management and dynamics, and can come to define the culture of the organization itself. Case in point: Basecamp and their CEO Jason Fried.

What we can perhaps hope to see by the fusion of data and ML/AI with conferencing, are listening and immersion aids and statistics. In some hypothetical future meeting, such immersion aids could improve the meeting experience. Given the direction that some of these innovations may take, there is an opportunity for new hardware to provide some of these immersive experiences for those at work in different settings – especially since for some of us, the immersion we experience at work is more like that of a musician playing an instrument and less like someone watching or enjoying a movie – there’s a visceral amount of immersion required for some tasks at work to be effective. Direct brain stimulation is the next step in communication beyond the audio-visual domain where we’ve been operating for all of humanity, and there’s work being done in this space. Some of these hardware are, if we are to go by recent advances in AR and VR, full of promise, given that they’re experimenting with experience-creating technologies such as direct brain stimulation (1, 2).

The Impact of Data and AI in the Post-Covid-19 Enterprise

Measurement systems and data will become increasingly more important for asynchronous, global and distributed enterprises. The cloud enables large scale data storage, in remote, managed services models. It also enables enterprises to convert capital expenditure considerations to operational expenditure, thereby providing them the flexibility to manage costs for teams, equipment and project funding separately from IT infrastructure. Serverless and cloud-based IT applications that are very contemporary at the moment will simplify nearly every aspect of the technology-enabled enterprise to sourcing, hiring, engineering, development and delivery, quality and customer experience management, and metrics will drive team performance, goals and agility in projects. For instance, there is no excuse for a modern enterprise (whether it is a startup or a truly large business) to not prefer the cloud for their website. Sure, they could maintain a backup server with their site, but it is a no-brainer to adopt cloud technologies for certain use cases – the risks and costs of starting from scratch don’t make for a good business case for most enterprises.

For cloud scale serverless architectures to be effective, they need economies of scale, among other constraints such as tooling and testing, on the adoption side of things. This is purely by design – cloud based serverless architectures are products rolled out by the big cloud firms, that depend on such scale to keep the costs low. Security and scalability issues currently persist but are far less frequent than those with on-premise infrastructure. One hopes that with the tailwind strengthened by Covid-19 related pressure, many companies seeking to go cloud-native instead of building their own IT infrastructure will use these capabilities going forward.

Companies outside of technology, across the spectrum from manufacturing, retail and telecommunications to oil and gas and energy will likely use the cloud a great deal in the future. Whether Covid-19 or not, many had already begun this journey. As a consultant working with clients on big data, data science and such initiatives, I’ve seen many taking new technologies on to ensure that they are able to stay ahead of the game (by gaining competitive advantage) and cost-effectively so. Manufacturing organizations can do more on the cloud than they thought possible today. Network speeds are fast enough for thin-client CAD applications that have high responsiveness, for example, and cloud-based servers could be used to run analyses such as finite element or CFD computations, that may be required in large scale manufacturing settings. The virtualization and digitization capabilities that the cloud brings in general, therefore, can cut team sizes significantly and manage aspects such as costs and consumption by moving to a pay-per-need model. Such an economic model can greatly benefit manufacturers in developing economies, if the benefits of scale elsewhere are made available to them.

Data, Cloud and Digital Transformation: Pre- and Post-Covid-19

Collecting and managing data from diverse source systems has been one of the many victories of data ingestion tools that have come into prominence and widespread use in the last decade. These ingestion tools, along with scalable data storage and processing systems that use distributed computing, have become the staple of big data and cloud-based AI/ML initiatives for numerous enterprises. I’ve written about these capabilities extensively on my blog earlier, as building blocks of enterprise scale data lakes and AI/ML systems. Such tools will come to see greater relevance and importance in the digitized enterprise transformed by Covid-19 risks.

With the need for large scale analytics and insights to drive efficient decision making in a remote work setting, where the individual is far removed from the business process, there is likely to be a greater demand on suppliers of the data for such insights, and for those who can deliver the insights themselves. This will in turn necessitate machine learning and data science – and overall, this paradigm is not unlike what we have seen earlier. The drivers for the earlier data and ML / AI revolution were competitive advantage, data driven decision making to achieve the upsides in transactions, and the need for low cost, agile, and lean operations. Now, however, Covid-19 related risks have resulted in a completely different set of motivations for digital transformation. For one thing, enterprises with high structural costs of business are now using Covid-19-induced drop in demand as a rationale for restructuring and reinvigorating lean and agility initiatives, by adopting remote work, contract employees, and distributed teams to save costs. In the short term, these trends will result in reduced operating expenses for existing facilities, and in the long term, this will transform into reduced capital expenses and reduced investment in new facilities. Additionally, data and AI adoption will grow for the reasons mentioned above – greater adoption of automation, cognitive/smart automation driven by machine learning, and productivity drivers will enable new kinds of value delivery in the enterprise.

New AI Capabilities and their Adoption

As a practitioner in and an observer of the AI and machine learning space, I find a number of new techniques in the spaces of natural language processing and generative modeling that have become research frontiers for those in machine learning today. Many of these techniques, from transformer models for natural language like BERT (link, explanation), to generative adversarial networks or GANs, have been experimented upon for a wide range of applications ranging from language translation, to face generation. With the rise of remote work and remote teams, there are many upsides to adopting such techniques. The contexts and problem statements around the use of machine learning in the post-Covid-19 world are still being revealed, and many enterprises are discovering such points of value, but the need, in this time of distributed teams for cross-cultural and cross-language communication, digital team building, real-time translation – all while preserving the personal touch – these things are important for effective remote work across distances, regions and time zones. These capabilities, along with virtual avatars for bots and virtual intelligent agents are just some use cases will see enterprise AI adoption (especially of the ML methods for richer data such as text, audio and video) at large scale.

There is another, underlying layer of technologies that will enable collaboration in the post-Covid-19 world – that of the telecommunications network. Large scale data transmission has become nearly ubiquitous now, with fiber optic technology becoming mainstream in the past decade. The coming years, due to Covid-19 and the risks that will follow it, will seed a reliance on the part of businesses on higher speed, near-real-time interactions, that enable complex automation tasks, such as completely remotely executed surgeries. While there is no substitute for a direct, in-person diagnosis and surgery for a lot of patients, for many surgeries, there is a gap between the expertise available and the need of patients around the world, and robotic surgery tools could be the frontline equipment in these battles. The enabler of such technologies is 5G communications technology, which is in turn comprised of a number of enabling capabilities, such as virtualized, software-defined networks. Physical hardware (copper, optic fibre and the like) and the network connectivity we get from this has driven us to the large scale, high speed direct-to-home fiber internet revolution today, but in future, virtualized networks of all kinds that rely on such physical networks such as optic fibre networks will play an important role for the transmission of voice, video and sensor data. The management of these networks and their performance as regards their scalability, security and capacity management could become entirely automated, using machine learning techniques. These are already problems being solved by the big telecommunications technology firms of the world, who are deploying scalable networks defined using software.

Virtualization and container-based environments for running AI and ML applications have become an important capability in 2019 and 2020, and we have seen large scale acceleration of machine learning deployment using container development and orchestration/management frameworks such as Docker and Kubernetes. We’ve also see the development of purpose-built machine learning deployment frameworks such as MLFlow. These capabilities, now being considered a new area of data and AI capabilities termed as Machine Learning Ops, are more likely to be taken up by organizations that are already using machine learning beyond the stage of prototypes and proof of concept activities. Mainstream technology firms and firms in manufacturing, energy and retail sectors may find less direct use for these technologies in the immediate future, unless they’re already building machine learning at scale. Containerized and similarly managed machine learning applications are important in the context of organizational agility to deploy ML capabilities to production and to have fast responses to production-scale ML model performance issues, such as model drift. Further discussion on this topic will be in a future post, since it gets a bit technical from this point on.

Sensors as our Saviours: Measured Surroundings

It goes without saying that Covid-19 has put a lot of focus back on China – the nation where it all started. From examination of the conditions that could have led to the cross-species transfer of the virus from bats to humans via pangolins, to broader examinations of the policy impact and the impact of social and cultural norms of Chinese food habits – there has been a lot written on the subject. It remains though, that this is an exceptional or rare event. Short of calling out the diets and social habits of the Chinese broadly, any root cause analysis that is scientifically minded has to start with the underlying conditions that lead to such transmissions, and that is perhaps a pathologist’s or an epidemiologist’s project.

Beyond simplistic monitoring of the conditions for the formation and transmission of such diseases, there are other direct applications for sensor based systems that can monitor and track environments where humans work – some of these measures over time could improve sanitation processes, especially in high risk zones that have historically been prone to infection.

Those in the IoT space should probably note the extensive need for no-touch systems, which we are all in need of due to this pandemic. For one thing – a lot of objects in our surroundings and a lot of household and public use items we need for daily life require direct physical contact – this repertoire of devices ranges from the ordinary smartphone or tablet screen, all the way to the simple devices which power our homes such as switches, door knobs, taps and faucets. It is clear that there is a new design philosophy that can benefit us all in such times. Providing smart sensor based systems that can open doors automatically, dispense soap automatically, or otherwise sanitize bathrooms and other public spaces, could be a shot in the arm. While these systems aren’t exactly breakthrough innovation for most companies, their widespread adoption hasn’t happened because of the relatively low cost of alternatives, and the high cost of adoption and maintenance. Once this entry barrier is broken either by governmental mandates and policies, or by increased public awareness, large scale IoT solutions like this, could take on additional veneers of sophistication – ranging from gesture recognition to automated sickness detection, automated reporting of sick/needy people in public spaces, for instance, or in sophisticated cases, automated interventions.

Sensors as our Saviours: The Measured Human

Another important theme related to Data and AI in the post Covid-19 world is one stemming directly from the sensor technologies that are mature today, but have not been adopted at large scale for reasons of ethics, cost and other considerations. The instrumented human, or the measured human is a concept that is at once both interesting and fraught with danger, probably ultimately because we each have a deep seated fear of being far simpler than we really are. More accurately, we are afraid of being manipulated by those with data on us. My own contention is that this is not just plausible in the post-Covid-19 world, but that it is a strong possibility. Let me explain.

Social media is an extraneous barometer that provides a periscope into the individual for the powers that be, while the sensors of the future that are embedded in our bodies, could become internal barometers of health. Today, we see people sounding off on social media (me included) on issues that affect us – and these messages are a representation of our thoughts and feelings not only on subjects at hand, but also are indications of our propensities and proclivities towards completely oblique issues from those that we’ve expressed ourselves on. In a sense, we’re putting ourselves out there everyday, and for no good reason. That data has been weaponized before. We have seen the repeated use by politicians, media houses and technology firms of the data we volunteer or otherwise allow them to collect, to manipulate us into buying new products, or clicking on ads (if anyone does indeed click on ads anymore), and even vote for this or the other political entity. In the age of the measured human, where we may see sensors measuring everything from our body temperatures (at different locations on our bodies) and our blood pH, to antibody and platelet counts in our blood, and so forth.

When we have this wealth of information available to an algorithm, leave alone a doctor, we could identify the propensity for specific conditions, and administer preventive medicine. Equally, such data could be misused in negative ways, just as personal data today is used to exclude individuals from opportunities and from credit. For example, data about personal medical metrics could be misused by health insurance providers, especially in cases where applicants may have pre-existing conditions. There are no technological solutions to such sub-problems, however, and the solutions are likely to come from good processes and a reflective, empathetic design process for these systems, rather than one which prizes the short term gains to the insurer or other enterprise in question.

The Short Term: Innovation vis-a-vis Big Government and Coverups

The Covid-19 crisis has revealed two sides and two scales of the global community’s response to the problem. One of these sides is the innovative side, as depicted by Italian doctors who repurposed scuba diving gear to treat patients in the face of equipment shortages for ventilators. The other side of this tragedy is the massive coverup – which we are nevertheless told never happened. One of these – the innovative side – has been more prominent in individual responses and community responses to Covid-19, whereas the other, more pernicious side of the global community’s response has been seen more often in big tech and big government.

It has been easy (and rather cheap) to speculate on the innovations that could solve some problems we face in the Covid-19 world. Here’s one interesting example of a thought/ideation experiment around masks. Masks have become a contentious topic both for the WHO who bungled their advice to the world at large and as a source of the next wave of tech innovations. This is one of the guys I have really come to respect from his Twitter posts on Covid19, Balaji Srinivasan:

Closer to home in Bangalore, India, there are startups coming up with sanitization equipment such as this Corona Oven, that enable a wide range of accessories and objects to be sanitized:

Product innovations like these will solve some of the problems we may face in the post-Covid-19 world. They may help us adjust to the new rhythms of life and work, and enable us to get the bottom layers of Maslow’s hierarchy out of the way, by enabling us to manage the food, shelter, security and safety of ourselves and our families. They also provide the opportunity to add new product features on top of them. Kano’s model of product evolution comes to mind, and is relevant. Just as the smartphone evolved from a pureplay voice communication device to an avenue for media consumption, and became a platform for new value, prosthetic devices such as masks could enable us in new and unforeseen ways – and it wouldn’t have been possible without the specific adversity this crisis brought to industrial design and engineering teams.

From the frontiers of machine learning, numerous innovations in computer vision have been brought to bear on Covid-19 X-ray and other data, to detect and prevent certain conditions that arise during the respiratory illness that Covid-19patients experience. Some of the other techniques rely on proxies to enable prevention, as seen in this research below:

The first response of China has been to cover up the scale of their Covid-19 infections and under-report the number of deaths from Covid-19. A casual glance at the Covid-19 curves for China vis-a-vis other nations that suffer from this crisis makes one wonder whether we’re seeing reliable numbers here. Many have spoken about the impact of Covid19 on governments – the act of strengthening governments by centralizing policy can rarely happen in democracies. One reason for this is the stabilizing power of vocal opposition parties in democracies. With the Covid-19 shock, the government’s instruments will be brought to bear on curbing personal freedoms where that may be a requirement, to prevent cascading infections. We’ve seen steps taken by nearly all nations around the world to curb freedoms, impose lockdowns, and as far as this is done in good spirit and with an intent to return the state to normalcy, we can come out of Covid19 with rights and freedoms intact. In the case of some countries though, this doesn’t apply – China being a key example here, because by definition the ruling party limits the freedoms of people and essentially fields the biggest army in existence for any political party on earth. Generally speaking, the coverup that China managed could not have been managed in any other country in existence. This in itself was a potent vector for transmission of Covid19, especially given China’s important place as the world’s manufacturing powerhouse. Which brings me to the next disruption from Covid19 to the world’s economic system: the world’s supply chain and manufacturing industries.

More generally, popular world leaders have talked about minimizing government footprint for decades in the post-Cold-War era. In recent years, we’ve heard slogans such as “Government has no business being in business” and “Minimum government, maximum governance” in the context of government intervention in industry and value delivery to consumers nation-wide or even across nations. The Covid-19 pandemic and crisis have ignited an interesting dichotomy in government and politics – should governments run institutions such as hospitals and healthcare mandatorily? If the answer to this is not an absolute no, to what extent should they? Perhaps some elements of a response to this question has to be how big the nation is, and how many caregivers are required, and what means the government has to ensure quality of service to patients despite operating at large scale – and these are relevant questions in the age of Covid-19. Generally (and one could say historically) crisis times (such as times of war, famine or epidemics) embolden governments to take strong countermeasures and revoke freedoms while enabling government officials to move faster. With the scale of democracy we have in large countries like the US and India, we probably need big government to enable the leaders to serve the greater good. The other side of the coin of course is the fact that very powerful governments don’t tend to part with that power easily, and excessive power concentration is a slippery slope leading to further mismanagement of countries.

Data and AI can be exceptional tools to enable data-driven governance. In a sense, if we are to look beyond the normal tendency to extend control from the government to the grass roots and lock things down from the top down, we could enable citizens to take informed decisions, by educating them about phenomena that could affect them, the consequences to them and to society at large, and then implore right action from them. Transparency is, in other words, the weapon of strong and sustainable data-driven democracies, because such democracies rely on the facts and information to take decisions, and not based on presumptions of behavior.

The onus for such dissemination of data to enable data-driven governance should fall squarely on governments. Governments often put the onus of interpreting and transmitting vital information on the media – and this model is fraught with problems. From the sensationalist news stories to erroneous reporting to putting important stories behind paywalls and “cookie clutter” screens, the world of internet news reporting is an unmitigated mess that’s accelerating towards becoming a train wreck of a disaster for the consumer. It didn’t have to be this way. Platforms like Twitter and Facebook have been accused of fomenting unrest, of political and other kinds of bias, and despite these reputations, they’re platforms on which important news has to be disseminated and consumed. These platforms are also simplistic and don’t lend themselves well to data-driven journalism. This isn’t a business or data/AI problem so much as a policy problem. If access to the internet is increasingly more important, access to authentic information on it should also be so, and the post-Covid-19 world, especially given the excesses of various world governments and media houses, will likely see a metamorphosis of the status quo.

Additional consequences of the Covid-19 crisis, is to accelerate the adoption of electronic payments the world over, enabled again by telecommunications, and perhaps the growth and acceleration of blockchain technologies for veracity in news and transaction reporting.

Supply Chains and Manufacturing: Globalization to Autarky?

The proverbial elephant in the room for manufacturers around the world in the post-Covid-19 world is the global supply chain, specifically how fragile their businesses have become due to over-reliance on China and the goods and components it produces for the world. From cheap toys and car parts to computer chips and smartphone screens, there are few things China is incapable of producing at large scale today, and this excess concentration of supplier bargaining power (to use a phrase from Michael E. Porter), is purely due to the perils of excessive capitalism. I say this as a bit of a capitalist myself – after all, anyone who has benefited from India’s economic liberalization is a bit of a capitalist. What is more important than just the fact that this is a case of capitalism’s excess, is that the global strategy for sourcing our supply chains across manufacturing industries has followed a groupthink, and a daftly simplistic and unstrategic winner-takes-all effect followed. In other words, it isn’t capitalism itself, but the limited set of strategic sourcing options that the West, which has controlled the world economy for decades, has had.

So, in the post-Covid-19 world, what sourcing options do we have? We risk continuing supply chains to run out of China for reasons of continued concentration of power with their firms, and for reasons of political and economic leverage. China’s one-belt-one-road initiative and the encirclement strategies they’ve used in the context of the South China Sea and elsewhere leave little doubt about that nation’s interest in protecting its strategic assets. On the other hand, in the US and in Europe, you have an ageing population and economies that have become unsustainable for taking on low cost manufacturing jobs. In the Middle East, we see spots of bother thanks to the geopolitical and geostrategic situation there, an overreliance on oil for energy and economics and the lack of skilled engineering and manufacturing talent, not to mention issues such as the impact of religious fundamentalism in societies across the Middle East. India and some of the ASEAN nations are relative bright spots. In addition to these, many eastern nations – excluding Taiwan, Korea and other rich Asian tigers – are probably the best places for maintaining competitive advantage in sourcing. We have already seen some countries incentivizing their companies to move out of China, and to other countries, and we will see this sourcing game continue.

However, even this approach only seeks to prolong what many have been calling the Old World Order of Globalization.  The new era, they claim, will see autarky to a great degree, where in-sourcing or domestic sourcing, and self-reliance will be the order of the day, where boundaries will be drawn again and nations will be closed off from others for years if not decades. Friends and enemies in this new world order will perhaps be long term relationships geostrategically, and the world order we see now, that is unable to solve every man’s ordinary problems (affordable healthcare, for one). Even in this world, one can foresee trade being an alternative for countries without the wherewithal to become autarkic owing to resources. The success of countries in the Middle East, for example, is due largely to their oil exports. In an autarkic world, the transitions made to today’s automotive sector will tend towards electrification, one hopes, which means overreliance on indigenously produced energy in each country, and under-reliance on sources such as oil from the Middle East. However, is this idea of pure autarky a step too far? Perhaps.

Data and AI capabilities are just being explored in the context of global supply chains. From older systems such as bar code scanners and object counters that track objects on conveyors, to modern ones such as computer vision and prescriptive analytics for on-time supply and demand matching in large supply chains, from voice-based ordering systems to no-waiting check out counters companies like Amazon and Walmart have adopted data and machine learning at scale and are putting together compelling examples of how to run much larger supply chains at global scale. Some of these technologies will fare them well in the post-Covid-19 world, although one can imagine a number of products in the post-Covid-19 world being sourced to these e-commerce platforms from places other than China – for whatever reasons. However, I foresee that these large digitized, high tech supply chains will be important even in the post-Covid19 world. American autarky, in other words, seems a distant dream, or more accurately, a lost utopia.

Environmentally Conscious Business in the Post-Covid-19 World

It isn’t an exaggeration to say that the economics of Covid-19 are being stressed more than the root cause of the problem – that of infectious diseases and how they spread. A large part of the reason why SARS, MERS and now Covid-19 have spread, is because of the conditions and policies in China and elsewhere that allowed it. Specifically, Chinese policies on wet markets and their breeding of Chinese wildlife for human consumption has been one of the contentious underlying topics of discussion.

More broadly, it indicates the importance of environmentally conscious business practices and their importance. Far too often, we settle for what is pragmatic and benefits humans, and don’t emphasize the impact to the environment at large from our actions in business. This may seem like a simplistic complaint, but in fact it is a deep and important one. The depth comes from the fact that the enterprise and the businesses we run are but one set of processes in a broad chain of environmental processes that sustain the planet. When we have simplistic policies concerning complex systems, we risk, in the words of Nassim Taleb, naive interventionism (iatrogenics), where we are unsure what true consequences our actions can have, even if we execute those actions “in good faith” or “to the best of our knowledge”. The cycle of value from the animals in question – bats and pangolins – is vast. They are part of an ecological system of balance where upon consuming lower forms of life that they prey upon, these animals enable the sustenance of a balance. When that balance is upset by either close proximity of species that rarely meet in the wild, or by selective breeding of these species, or by other means, there are likely to be shocks to that ecological balance from which we may never fully recover.

Learning and Staying Relevant in the Post-Covid-19 World

For me personally, the last several years have revealed the power of the internet to educate. From short courses on various topics in data science, machine learning and AI to extensive courses such as in a post-graduate diploma – all of these seem to be worked out for large scale skill development online. On the internet today, there are numerous free resources especially for those in technology, computer science, software, data and analytics – these areas of contemporary advances and cutting edge research see a surfeit of information and content which is near-well an embarrassment of riches – an this is all good until we see a gap between supply and demand. Numerous learning opportunities have been opened up specifically post-Covid-19 . Google, EdX and Coursera all announced a number of new courses, some of them free. If you know where to look you can find incredible content on the internet to teach you nearly anything in machine learning from the basics to the latest algorithms and research.

But here’s the thing – in reality, there is a supply vs. demand gap in online learning. Specifically, there is a great deal of supply of content and courses in a few areas of technology, science and engineering, and largely nothing in other areas. There is research hidden behind paywalls in important areas such as epidemiology, which is a core research domain as regards the Covid-19 crisis. This huge disparity is also a problem of curation, of practicality in the dissemination of certain subjects, and of economies of scale.

The internet as a medium is not best suited for teaching certain skills – sitting down in front of a computer is not the best way to learn how to turn a component on a lathe in a machine shop, or how to play a guitar (although you could argue about the latter, as I myself have learnt a lot of guitar just by looking things up online and from practice from self-study). The limitations of this medium in disseminating certain kinds of knowledge is well known and well attested and yet there are attempts to move entire courses, and even masters degrees online. While initially this was seen questionably, in the post-Covid-19 world, we can assume that such online learning will gain momentum – and if my experiences have taught me anything, it is that with the right tools and interactions, you can learn a surprising lot online.

Staying relevant in the post-Covid-19 world is a harder task than just learning in this increasingly socially isolated and digitized world. Learning is just the acquisition of skill, whereas relevance is a consequence of being the right person, in the right environment. The latter is therefore equally contingent on skills and on our own explorations of and (entrepreneurial or other) responses the conditions we’re in – and this consequently influences how we use these skills that we have acquired. For instance, we know that in the post-Covid-19 world, there are likely to be sea changes in which industries are relevant and which ones aren’t. Putting our feet in the right places, and bringing value to these new interactions that we become part of, can make all the difference between whether we’re relevant in the future and creating some of the history, or whether we’re just another casualty of history.

Concluding Remarks

The world post-Covid-19 is a time of change, indicating a complex, new reality. There are economic shocks will will impact us for years, if not decades to come. We’re in a place of incredible opportunity vis-a-vis a position that poses incredible challenges as well. Enterprises as we knew them will change forever, adopting new styles of work and learning, and professionals will awaken to a new age of online learning and a protracted search for relevance and professional meaning in some cases. Smart governments will adopt data and communicate and govern based on facts – even as others will use these opportunities to grow large scale government influence; indeed, questions of governmental oversight on essential services including public health will be debated for years to come. Data and AI adoption in enterprises will accelerate in enterprises and will enable new kinds of collaboration and remote work, required for these months and perhaps years of social distancing and isolation. Enterprises will accelerate their move to the cloud, benefiting from large scale and low cost services for data, web, and other technologies. Emerging technologies such as augmented and virtual reality may become a staple of our boardrooms and classrooms. More and more learners will try and adapt to online learning, and more teachers and professors will be compelled to learn to teach on this medium, even as new technology interventions will improve learning experiences. As many governments around the world will rush to build self-reliance and their respective versions of autarky on many essential manufactured products, the global supply chain will start looking different, and we may see the greater infusion of data and AI technologies in the businesses that control our supply chains and logistics. We may see the growth of blockchain and other trust-centric technologies, for applications in medicine and the news, in addition to finance where it finds its most common use cases. The post-Covid19 world is a clarion call to problem solvers and innovators of all kinds, as much as it is for those in policy and governance, public health and medicine. The world order has been upset, and the new world order that will manifest after this pandemic is behind us, will look to the resourceful and the inventive even as people look towards being part of sustainable, healthy and safe work and living environments in the future.


One AI Marketing Conundrum

We are now in an age when the simplest kind of intelligence built into products and services is being marketed as “AI”. This is a regrettable consequence of current marketing practice, that seems to extend to individuals, products and even job postings. For instance, it isn’t unusual to want to hire “AI developers” these days, who have certified “credentials in AI”.

As a professional in the AI and Machine Learning space, I have come across and perhaps to an extent have been complicit in, such hype. However, with time, you gain perspective and collect feedback. Of late, the more strident the clarion calls of “AI this” and “AI that” are in products, the more common it is to see ordinary consumers become dismissive of new technology. I truly think this “performance undersupply” (to use a phrase coined by Tinymagiq’s Kumaran Anandan) in AI marketing is a bit of a regression (pun intended).

For instance, tools with natural language processing are routinely called out as being “AI”. Let’s dig a little deeper:

  1. Text mining, and the extraction of information from documents, requires mathematical representation, modeling and characterisation of corpuses of text Stemming, lemmatization and other tasks commonly seen in task mining fit into this category of tasks.
  2. Models built on top of such representations that use them as input data learn statistical relationships between different representations. Term frequency histograms, TF/IDF models and such represent such statistical models.
  3. End-to-end deep learning systems that perform higher-order statistical modeling on such representations can learn and generate more complex patterns in text data. This is what we see with language translation models.

Note that none of the above truly imply intelligence. While there is an extensive use of statistical methods and techniques to represent text data and model it mathematically and statistically, there is no memory, context capture, or knowledge base. There is no agent in question, and therefore these can at best be described as enablers of artificial intelligence.

A post on LinkedIn by Kevin Gray talks about the same problem of marketing machine learning capabilities as AI. My response to his post is below, and perhaps it provides additional context to the discussion above on NLP/NLU/NLG and how that should be considered an enabler of AI, and not AI in and of itself.

The contention here seems to be on the matter of whether something can be described (and by extension marketed) as AI or not.

Perhaps it is more helpful to think of ML algorithms/capabilities such as NLU/NLP/NLG (as with audio/image understanding, processing and generation tasks) as _enablers_ of intelligent systems, and not the intelligence itself. 

This distinction can perhaps help address the fact that consciousness, memory, context understanding and other characteristics of real-world intelligent agents are not glossed over in our quest to market one specific tool in the AI toolkit.

Coming to multiple regression – clearly a “soothsayer” or a forecaster (in the trading sense, perhaps) is valued for their competence and experience, which brings context and the other benefits I mentioned of real world intelligent agents. When a regression model makes a prediction along similar lines, that does not assume context either, and is therefore not in and of itself an intelligent system.  So in summary, I’d say that NLP/NLU/NLG and such capabilities are also not “AI”, just as stepwise multiple regression isn’t.

From my comment.

Coming back to the topic at hand, we all can probably acknowledge first that marketers won’t stop using the “AI” buzzwords for all things under the sun anytime soon. That said, we can rest easy because we might be able to understand, with a little effort, what the true capability of the marketed product or service in question is. Mental models like those described above might help contextualize and rationalize the hype as and when we see it.

What Could Data Scientists (And Data Science Managers) Be Doing Better in 2019?

The “data science” job description is becoming more and more common, as of early 2019.

Not only has the field garnered a great deal of interest from software developers, statisticians and machine learning exponents, but has also attracted plenty of interest over the years, from people in roles such as strategy, operations, sales and marketing. Product designers, manufacturing and customer service managers are also turning towards data science talent to help them make sense of their businesses, processes and find new ways to improve.

The Data Science Misinformation Challenge

The aforementioned motivations for people interested in data science aren’t inherently bad – in fact, they’re common sense, reasonable starting points to look for data science talent and begin analytical programs in organizations. The problem starts with the availability of access to sound, hype-free information on data science, analytics, machine learning and AI. Thanks to the media’s fulminations around sometimes disconnected value propositions – chat bots, artificial intelligence agents, machine learning and big data – these terms have come to be clumped together along with data science and machine learning, purely because of the similarity of notion, or some of the skills required to build and sell solutions along these lines. Media speculation around AI doesn’t stop there – from calling automated machine learning as “Building AI that can build AI” (NYT), to mentions of killer robots and killer cars, 2018 was a year full of hype and alarmism as I expect 2019 will also be, to some extent. I have dealt with this topic extensively in an earlier post here. What I take issue with, naturally, is the fact this serves to misinform business teams about what’s really important.

Managing Data Science Better

Astute business leaders build analytical programs where they don’t put the cart before the horse. By this, I mean the following things:

  1. They have not merely a data strategy, but a strategy for labelled data
  2. They start with small problems, not big, all-encompassing problems
  3. They grow data science capabilities within the team
  4. They embrace visualization methods and question black box models
  5. They check for actual business value in data science projects
  6. They look for ways to deploy models, not merely build throw-away analyses

Data science and analytics managers ought not to:

  1. Perpetuate hype and spread misinformation without research
  2. Set expectations based on such hype around data science
  3. Assume solutions are possible without due consideration
  4. Not budget for subject matter experts
  5. Not training your staff and still expecting better results

As obvious as the above may sound, they’re all too common in the industry. Then there is the problem of consultants who sometimes perpetuate the hype train, thereby reinforcing some of these behaviors.

Doing Data Science Better

Now let’s look at some of the things Data Scientists themselves could be doing better. Some of the points I make here have to do with the state of talent, while others have to do with the tools and the infrastructure provided to data scientists in companies. Some has to do with preferences, while others have to do with processes. I find many common practices by data science professionals to be problematic. Some of these are:

  1. Incorrect assumption checking – for significance tests, for machine learning models and for other kinds of modeling in general
  2. Not being aware of how some of the details of algorithms work – and not bothering to learn this even after several projects where their shortcomings are highlighted
  3. Not bothering to perform basic or exploratory data analysis (EDA) before taking up any serious mathematical modeling
  4. Not visualizing data before attempting to build models from them
  5. Assuming things about the problem solving approach they should take, without basing this on EDA results
  6. Not differentiating between the unique characteristics that make certain algorithms or frameworks more computationally, statistically or otherwise efficient, compared to others
  7. Some of these can be sorted out by asking critical questions such as the ones below (which may overlap to some extent with the activities listed above):
    1. Where the data came from
    2. How the data was measured
    3. Whether the data was meddled with anyhow, and in what ways
    4. How the insights will be consumed
    5. What user experience is required for the analytics consumer
    6. Does the solution have to scale

This is just a select list, and I’m sure that contextually, there are many other problems, both technical and process-specific. Either way, there is a need to exercise caution before jumping headlong into data science initiatives (as a manager) and to plan and structure data science work (as a data scientist).

Pragmatic Business Transformation with AI

I interact with numerous data scientists and people in the data science space on LinkedIn on a daily basis. Many of these have insightful things to say, about how data and artificial intelligence are transforming the business landscape. There is a certain alarmism in the context of the automation of business processes, that accompanies every discussion on artificial intelligence, and with good reason. One of these is Vin Vashishta, whose posts often address pressing challenges in data and AI. Here is a recent post by Vin and my comment. This blog post was originally on Medium, and is an expansion of the ideas represented by the comment.

Traditional Thinking Couches

Traditional thinking about how work gets done, in general has the following elements. Traditional work and time based thinking is based on scientific reductionism and paradigms such as linearity. In truth, this thinking has allowed us to come very far. The division of labour is the very basis of capitalism, for instance, and modern capitalism thrives on specialization and the management of work in this form.

  1. Linearity: The tendency to think of all work as ultimately reducible into linearly scalable chunks. Less of a task requires less resources, whereas more work requires more resources. To be fair, this kind of thinking has been around for millennia, since at least the time of human settlement and the neolithic age.
  2. Reducibility: This is a tendency to think of work as infinitely reducible, in such a way that if we complete each sub-task of a job in a certain sequence, we have the end result of completing the whole job. Systems engineers know better, and understand holism and reductionism in systems as analogies to the traditional view of reducibility and how it might affect the way we see work today
  3. Value-based Work and Tangibility: Another element of what seems to define work traditionally is the presence of tangible objectives, such as items shipped, or certain unambiguously measurable criteria met. In this world, giving a customer a good experience when they shop, or enabling customers or partners to better be served or serve us better, aren’t seen as value, but as non-value-added activities. For a long time, approaches to business transformation focused on the reduction of non-value-add activities from business process, with the view that this will improve process efficiency.

When we think about how businesses will take up AI and machine learning capabilities, we’re compelled to think in terms of the same above lenses. They’re comfortable couches that we cannot get out of, and as a result, possess and dominate our thinking about AI deployment in enterprises.

AI-Specific Cognitive Biases

Some dangers of thinking driven by the above principles are as follows:

  1. Zero-sum automation: The belief that there is a fixed pie of opportunity, and that when we give human jobs to machines, we deprive humans of opportunities. Naturally, this is not true, because general, self-organizing intelligences such as humans are more than capable of discovering and finding new opportunities. Fixed-pie thinking is probably one of the key reasons behind AI alarmism. I would additionally argue that at some level, AI alarmism is also the result of bogeyman thinking, a paradigm in which a strawman such as AI is assigned blame for large scale change. In the past, a lot technological progress and change happened without such bogeymen, even as other changes were being prevented because of such thinking. Another element of bogeyman thinking is the tendency to ignore complementarity, including situations where humans and AI tools could work alongside each other, resulting in higher process effectiveness.
  2. Value bias: While there is truth to the notion that processes have value-add steps and non-value-add steps, it is a feature typical of reductionism to assume that we don’t need the non-value-add steps at all, while they may be serving true purpose. For instance, all manufacturing processes that transform raw material to product have ended up requiring quality checks and assurance. As a feature of the evolution of industrial production processes, quality assurance and control have become part of nearly all manufacturing processes that operate at scale. QA and QC represent a non-linearity in the production system, or a feedback loop which provides downstream process performance information to upstream processes.
  3. Exclusivity: A flip side of bogeyman thinking, combined with value bias, is the phenomenon of exclusivity. For example, the interpretation of emotional expressions on a human face, has for long been a task that humans are great at — for long, we didn’t know of any higher animals, let alone technologies, that had this level of sophistication. Now, there’s a lot going on in the ML/AI space that has to do with the so-called soft aspects of human life — judging people’s expressions and understanding them, learning about their behavioural patterns, etc., and these capabilities are becoming more and more mature within AI systems on a regular basis. This contradicts traditional notions of human-exclusive capabilities in many areas. Naturally, this is seen as a threat, rather than a capability enhancer. The truth is that exclusivity is also to be considered a logical fallacy when discussing the development of AI systems.

It is common for one to fear he who seems to do everything that one can do, until that person becomes one’s friend. I’d say that the word is still out on what AI cannot do yet — and as a result, our approach to business transformation (as with transformation in other areas) should be humans + AI, and not AI in lieu of humans. This synergy is already visible in the manufacturing world, and perhaps we will see it make its way to other spheres as well. Fixed-pie thinking won’t get us anywhere when we have capability amplifiers like AI to assist humans.

Concluding Remarks

A key element of future human productivity is the discovery and exploitation of new opportunities in new frontiers. My suggestion to business leaders thinking about AI adoption for automation and process improvement, is to expand the pie first, by creating new opportunities to do more as a business, and enable your employees to take up and contribute more to your business. When you then enable them with AI, the humans+AI combination you will see as a result will take your organization to new heights.

Contextualizing “AI Alarmism” in Business Process Automation

Alarmist speculations about Artificial Intelligence are everywhere these days. Business managers in labour-intensive markets such as India and China have, in recent months, come to fear data-driven process automation, often unfairly and unnecessarily. In this post, I wish to discuss some of the AI alarmism we see in the general public at large – ranging from well-founded speculation to the truly ridiculous. I will also present two mental models that may illustrate the usefulness of AI in process automation, before we arrive at how to contextualize AI-based automation.

Some Contours of AI Alarmism

In the last several months, the media has been awash with articles about data-driven process automation made possible by artificial intelligence, that is said to be doing any of the following things (listed in order of increasing speculation craziness):

  1. Taking away our jobs and rendering vast sections of human society jobless
  2. Doing things that humans do better than humans do them, and thereby obviate the need for humans in certain very human activities
  3. As a panacea for all kinds of faults and frailties that make us human, and therefore a representation of the post-human world
  4. Killer robots that will wrest power from all of human society, thereby resulting in the standard-issue-technology-apocalypse that is the staple of Hollywood movies

It is important to assess the sources of these fears and speculations, if only to debunk some of this AI alarmism. It is also important to understand true challenges where they may exists and the threats in that context.

A Process View of AI-driven Automation

In the past several decades, we have seen numerous technology revolutions and their socio-cultural impact on human society. Whether the rise of computerised and robotic manufacturing processes, that led to the digitization of manufacturing, or the evolution of automation methods in the knowledge work space we’ve seen in the last decade or so, the fundamental drivers have been two fold – improved process performance, and increased process flexibility:

  1. A better process for delivering value
    1. Improved process quality and reduced variation
    2. Reduced process time and opportunities to continually improve
    3. Reduced process cost and opportunities to spread value within processes
  2. A more scalable and predictable process for delivering value

Given this broader process-based view of excellence for organizations and how managers look to new technology from an operational effectiveness standpoint, can we see automation driven by artificial intelligence in a new light?  For instance: how can we understand what AI specifically offers to the process automation ambit, and what this means for businesses? To understand this, let’s take a look at what AI solutions currently allow businesses to do:

  1. Automate embarrassingly simple processes in business processes that have true scale, that are based on well-defined rules, but which are subject to variation – and do so cost-effectively
  2. Automate somewhat complex processes which require some human intervention, but which are not mission-critical, and do so in businesses processes that have true scale

Now, let’s look at what AI based automation is not capable of accomplishing in its current state:

  1. Truly domain aware decision making, as an expert system that is aware of business context, and which can make holistic recommendations only possible by highly skilled experts
  2. Truly complex decision making that considers multiple factors in a non-formulaic or dynamic manner
  3. Tasks of moderate to high complexity to be performed in a business environment where the scale of the business isn’t large

Automation value add at scale

Automation and its effectiveness with business scale

Automation value add with complexity

Automation and its effectiveness with changing process complexity

As you can see above, cost-effective process automation is held back by the business case of it and its applicability at different business scales. This leads to an interesting cost-benefit value analysis. AI based process automation in businesses is most effective when there is true business scale, when the processes in question are either simple, or moderately complex.

Data and AI-Based Automation

There is yet another factor that could potentially affect how effective automation might be – and this is the availability of data from processes. The importance of data can be characterised in some key ways:

  1. A core enabler for artificially intelligent systems and applications is learning from data. Being able to learn from data implies that there is a need to use statistical techniques. This implies machine learning, statistical inference, time series modeling of data in real time, etc.
  2. Building domain-specific context and awareness within the application implies needing to use knowledge models, which are representations of the system’s domain, in the form of entities and relationships.
  3. A key consideration for an intelligent system is not only being able to learn from data in the domain, but also the ability to act on the domain. These domain actions can take many forms – from the machining and welding processes we see in robotic manufacturing systems, to computer programs that can generate instructions for writing other programs or instructions in data-intensive systems.
  4. A subset or enabling capability in this context, therefore, is the ability to collect and manage data of various kinds in scalable ways, and in real time.

Reasons for Alarmist Speculation

Given these mental models of process-centric and data-centric views of AI-driven automation, let’s take a step back, and look what what is fueling this speculation:

  1. Misunderstanding about what artificial intelligence is and what capabilities it entails, on the business process side or on the data analysis side
  2. The lack of an objective scale for measuring or understanding AI progress
  3. Oversimplification of even simple, old and established human-in-loop systems
  4. Gross oversimplification of complex, human-engineered, industrial systems
  5. Mass media speculation that rides on the latest and greatest technologies, and importantly,
  6. The unceasing tendency of tech reporters and media to both liken the future to science fiction, and to jump to visions of utterly glorious or utterly ghastly futures, rather than evaluating technologies and their impact realistically

Concluding Remarks: Contextualizing AI-based Automation

First off, it is important to recognize that not all AI-centric speculation is unfounded. I wish to call out not those who have legitimately raised alarms about the policies, economics or ethics implications surrounding AI-based process automation, but those who stretch the speculation to the realm of fantasy. It is near-impossible to replace humans for certain kinds of tasks, such as those explained above that are comprised of high complexity, and that are mission critical for businesses. It is also important to consider the true scale and business realities of enterprises when speculating on AI. To this end, we may have to ask questions around whether and how a firm may use AI, and whether they have a sufficiently strong business case. Not only should speculators, consultants and pundits use such thumb rules, but it behooves business leaders and managers to similarly understand their own businesses.

Further Reading

  1. “Impact of emerging technologies on employment and public policy”, by Darrell M. West, Brookings Institution (link)
  2. “How humans respond to robots: building public policy through good design”, by Heather Knight, Brookings Institution (link)
  3. “It is time to dispel the myths of automation”, Viktor Weber, on the World Economic Forum website (link)

Key Data and AI trends in 2017

This year, 2017, has been quite a busy year for artificial intelligence and data science professionals. In some ways, this is the year when AI truly began to be debated and discussed, from frameworks and technologies to ethics and morality. This is the year when opportunities for AI-driven improvement in businesses began to be examined critically by diverse industry professionals and academicians.With good reason, machine learning and deep learning came to be placed at the top of the Garner’s hype cycle. We’re really at the peak of inflated expectations when it comes to ML/DL – with opportunities to shorten the time we take to reach measurable and direct consumer value.

Image result for gartner hype cycle 2017

Gartner Hype Cycle for 2017

Overall, in my experience, three key trends that enterprises welcomed in 2017 include:

  1. Simplification of cloud and data infrastructure services
  2. Improved and democratized scalable machine learning and deep learning
  3. Automation in key AI, ML and data analysis tasks

Improving Cloud and Data Infrastructure

Perhaps the foundational enabler for the data strategy of many enterprises that I have seen and worked with in 2017, is the availability of an easily operated and managed scalable cloud infrastructure. This promise of a high performance, low cost and (arbitrarily) scalable cloud infrastructure was made as early as 2014, but has taken a few years to materialize as a truly viable, business-wise feasible commercial offering from a stable, top-tier technology firm. Prominent cloud vendors such as Google Cloud, Microsoft Azure and Amazon’s AWS have upped the ante, while veterans like Hortonworks, Cloudera continue to hold sway. This space where the cloud vendors are competing is ripe for consolidation, in my view, although we can expect to see converging architectures before viable consolidation that isn’t entirely wasteful can happen.

Other notable developments on the cloud infrastructure side of things were ideas such as serverless compute (which enterprises are definitely warming up to – and it shows, in the Gartner Hype Cycle), production-ready pre-built models for common tasks as APIs (a trend that continues to inspire software/AI application architecture) and the performing of streaming and real-time data processing frameworks. By combining these capabilities in cloud platforms, cloud providers have really upped their offerings in 2017 compared to before, and provide formidable capabilities – which in my view haven’t even been explored as much as they should have been by businesses.

Despite the availability of such production-ready, cost-effective and scalable data management systems in the cloud, cloud infrastructure has nevertheless come under scrutiny in 2017 for massive security lapses and downtime. To speak of specific examples, we had the biggest impact events in cloud reliability and data security history between Equifax data breach and the massive AWS outage, to say nothing of the numerous data security episodes of smaller scale that were attributable to hacktivism, such as the Panama Papers.

As a counter to some of these incidents and the rise of the GDPR and other data protection regulations, numerous cloud providers have been offering “private cloud” solutions, along with region-specific hosting options for banks and other organizations that deal with regulation-sensitive data.

Additonally, it would be unfair to not point out how much containerization has helped cloud providers in 2017. Massive scale adoption of containerization using Docker and Kubernetes has enabled virtual environments to be set up and managed for complex development and deployment tasks that are data intensive.

Spark and Tensorflow

The space of scalable machine learning frameworks continues to be dominated by Apache Spark – which has found many friends among data engineers and scientists in production after the 2.0 release, especially, given its equitable performance for the data frame APIs across languages. So, whether you program in Python, R, or Scala, you can be assured of the same high performance from Spark these days. Spark ML has expanded on the capabilities of Spark ML Lib, and in its recent releases, Spark has also polished and unified the interfaces for streaming data analysis on Spark-Streaming and graph analysis via GraphX. As someone who has seen teams use Spark for different purposes and built frameworks on it in 2017, the differences between versions 1.6 and below, and 2.0 and above are significant, and the newer versions are more polished and consistent in their behaviour.

Tensorflow received a lot of hype but only lackluster adoption in late 2016 and early 2017, but over the last several months, has made a strong case for itself, and adoption has grown significantly. As developers have warmed up to the framework, and as more language interfaces have been developed for Tensorflow, its popularity has soared, especially in the latter half of 2017. Another factor in the development and adoption of Tensorflow is the widespread use of GPU based deep learning. The core Tensorflow development team’s additions to 1.0 (as explained by Jeff Dean here) have made it a mature deep learning development package and perhaps the most widely used and sought after deep learning framework. While Torch makes an impression and is widely loved (especially in its PyTorch form), Tensorflow is hard to beat for the speed and dynamism of its high quality open source contributors. At Strata Singapore 2016, I sat through a tutorial on Tensorflow 0.8, and what I saw then contrasts with what I see in versions 1.0 and higher. My recent brushes with Tensorflow have made me more convinced that this is the framework to learn for deep learning developers at the moment. The presence of wrappers and higher level interfaces, such as Keras or Caffe, has made Tensorflow very easy to use for entry-level and intermediate programmers and data scientists.

Automation in ML, DL and Data Science

Without a doubt, the development of automation-centric techniques to automate parts of ML and DL development is one of the biggest and most important directions within the field of Artificial Intelligence in 2017. Taking after Leo Brieman’s random forests (an ensemble of “weak learners” resulting in a machine learning model with high performance) and various advancements in deep learning and machine vision (especially convolutional neural networks, which essentially encode complex features using simpler features in computer vision problems), hyper parameter optimization automation was probably the first step in the general direction of automated machine learning.

Frameworks like AutoML (see the talk by Andreas Mueller above) have been the cynosure of this kind of research, and companies small and large have begun attempting different approaches for solving the context modeling problem that arise from the need to automate data science. While most approaches towards machine learning have taken a classical approach, by finding computational approaches to learn more and more from data, some have take non-traditional approaches, by combining ideas from expert systems, rule based inference engines, and other approaches. A novel approach to machine learning has been the invention and development of generative adversarial networks (GANs) which could lead to hitherto unseen improvements in the use of computationally generated data as a starting point for understanding the best representations of a given dataset. Despite being invented in 2014, it is in 2017 that implementations of this kind of network became popular and came to be considered as a viable neural network architecture for computer vision and other kinds of machine learning problems.

Other noteworthy trends within the data and AI space include the rise and improved performance of chat bots and conversational natural-language enabled APIs, the amazing improvements to translation and image tagging made possible by deep learning, and the important question of AI ethics – starting from that now-famous question of “should your self-driving car kill a pedestrian in order to save your life”, to ethical conundrums and alarmist remarks from tech luminaries such as Elon Musk.

Concluding Remarks

So, what does 2018 hold in store? That seems to be the question on everyone’s lips in the data and AI world, and it is also what data and AI enthusiasts in different industry roles are looking to understand. While it is not possible to clearly say which trend will dictate progress in 2018 and beyond, it is clear that the above three developments will form key cornerstones on top of which future capabilities for AI and enterprise scale data management and data science will be built. Hope you enjoyed reading this. Do leave a comment or a note if you would like to share more.

Andrew Ng’s DeepLearning.AI (Coursera) Certification

2017-10-21 19_43_58-Clipboard

One of the more interesting mental models of machine learning I’ve come to understand in the last month or so, is the “five tribes of artificial intelligence” model popularized in “The Master Algorithm” by Pedro Domingos. To summarize in a phrase, the master algorithm is that approach which can uncover all possible insight from data – and Prof. Domingos hypothesises that there are five distinct such “master algorithms”, one for each of these tribes. One of these “tribes” is the connectionists, whose master algorithm is, in fact, backpropagation, which is central to the design and operation of neural networks.

A Connectionist Tour Guide

In a sense, the deep neural network has become synonymous with artificial intelligence today. There are numerous other algorithms which could lend a sense of intelligence to machines – whether by communicating in natural language as a conversationalist (starting from rudimentary bots like ELIZA through Pootwattle and Smedley (of U Chicago fame), to modern chatbots), or by learning to differentiate different kinds of faces, or identify emotions of specific kinds. The deep neural network has successfully been applied to numerous such real world problems, and therefore stands out as being promising on this account. For the other tribes, we don’t yet have algorithms such as “advanced induction inference machines”, or “higher dimensional kernel machines” – whatever these may indicate (really or apocryphally). So it behooves us to pay attention to stories such as this one, which discuss the “unreasonable effectiveness” of neural networks.


DeepLearning.AI’s Course

There’s definitely a skills gap in the advanced machine learning and artificial intelligence space. Businesses are as yet unable to see value beyond the hype. Unsurprisingly, the skills gap has to be addressed at the very root – the fundamentals, where the ability to model problems, computationally solve them, and build systems out of such solutions intersect. Andrew Ng has, also unsurprisingly, taken a stab at the deep learning space, if his “AI is the new electricity” talk is anything to go by.



Over the last few weeks, I’ve had the opportunity to spend some time on Andrew Ng’s Deep Learning course from For me, this is like a tour guide to the world of the connectionists. The reality is that neural networks don’t work like the human brain apart from superficial similarities – as Ng himself explains in the course – but the term has stuck, since the motivations of early pioneers who also knew some neuroscience led to the moniker.

The Coursera certification is organized into five different courses, and the first of these lays the mathematical and programmatic foundation for implementing them. This first course, titled Neural Networks and Deep Learning has well-orchestrated exercises within Coursera’s integrated Jupyter notebook interface, and you can use the algorithm on your own data, to evaluate its performance. I’m currently some way through the second course, having finished the first one – and I have to say that the videos, programming exercises and other course aspects create a true learning feedback loop, which is effective in teaching the basics really well. I’m very impressed with the way the course has been put together and made accessible to those with a little bit of machine learning knowledge, who are starting out on neural networks and deep learning.

Course Experience

In the below section, I’ll outline my key learnings from the first course in the certification. I hope that you take the course, if you are a ML and AI enthusiast or young professional (or even an experienced one) interested in working on deep learning.

  1. The course introduced the most fundamental ideas of neural networks at the very start, with extensive coverage on how to implement a logistic regression model for classifying data. This intial discussion was built up rather nicely into a discussion on deep learning.
  2. As an intermediate course, it assumes some amount of knowledge of linear algebra and differential equations. As someone who works with machine learning models, I was able to grasp the intuitions with one repetition. If it has been a while since you worked through linear algebra and differential calculus (or thought through equations, at the very least), expect to take a while to find your feet.
  3. Some of the intuitions around gradient descent, the values of derivatives, and so on, were introduced very handily – and were reinforced through the exercises.
  4. The importance of vectorization and its central use in numpy (which is used extensively – nay, almost exclusively – throughout the course) was well brought out. Numpy is a powerful library and surprisingly, received its first funding only in 2017 after being useful for the development of numerous algorithms and tools. Some of its quirks, such as order (n,) vectors, were especially interesting and useful to learn about. Overall though this isn’t a numpy tutorial by any stretch, it is referenced extensively.
  5. During weeks 2 and 3, the logistic regression algorithm is taught in a different context – it is likened to neurons in a deep net, and the details of activation functions are discussed. This, to me, was the meat of the course.
  6. In weeks 2 and 3, a consistent methodology and notation was followed for the discussion of and the implementation of  forward and backward propagation, two of the key mechanisms in any neural network – and this was done entirely within numpy, and these are great hands-on lessons. Stochastic gradient descent was also explained and implemented.
  7. Finally, in week 4, deep neural networks were handled, and parametrization of the neural network topology was introduced. Ideas related to this, such as hyperparameter optimization were also discussed. Additionally, in both videos and assignments, Andrew Ng provided practical advice on how to get the matrix dimensions right for weight and bias vectors – without this and the consistent notation, a lot of the programming implementations of DNNs could potentially get very hairy, so I personally felt that this was very well handled.
  8. A cat classifier deep neural network in Week 4 – because who doesn’t like cats?
  9. Right through the course, there are optional video lectures, and interviews with well known researchers. One of them is with Geoff Hinton, and it was definitely instructive.




I’m about half-way through the second course, on Improving Deep Neural Networks, and my experience there has been similar to the first course. The content derives directly from the content of the first course, and therefore, going in sequence from the first to the second definitely has its advantages. If you were to start the second course of the specialization first, expect to spend some time to find your feet. So far, I only wish there had been better explanations of ideas like dropout and L2 regularization, especially given the tricky quizzes in Week 1. This is a 3-week course, and I wish an additional week, or a few more videos had been spent initially, explaining and firming up ideas around regularization. Additionally, the exploding/vanishing gradient problems could be better illustrated with videos and so on, although I felt the course generally does a good job of explaining the essentials of these ideas.

Concluding Remarks

To conclude, I’d recommend this certificate for those in the analytics, data science or machine learning space, who are a bit hands on, can grasp linear algebra and calculus, and can work with Python. You’ll find that since this is an “intermediate” specialization, neophytes will require multiple viewings of the videos to become conversant in the ideas and concepts. This still shouldn’t deter those who want to audit the course or learn the concepts therein for a deeper understanding to back up their direct experience in machine learning.

Related Content

  1. My Quora answer on Deeplearning.AI’s Coursera course