Questions that Data Scientists Hate Getting

This is a variation on a Quora answer.

When asked how data scientists can be effective, there are a few things that com e to mind:

  1. Skills: A curiosity and sufficient skill in data analysis methods and techniques
  2. Fundamental needs: the data and access to the tools to perform analysis — and this would include the environments
  3. Performance needs: Sufficient resources, time and good enough processes to validate or invalidate hypotheses and build models based on them
  4. Excitement needs: Sufficient support and latitude to independently deploy projects based on successful hypotheses tested and models built

Note that while these criteria listed above begin with the fundamental skills required to do data science, the focus shifts in items 2, 3 and 4, to what is required for data scientists to be effective. The first of these are the fundamental needs, such as the data itself, and the access to the required tools, be they statistical or machine learning tools, databases, visualization libraries, or other resources. The second of these are the performance needs, which will help the data scientist do whatever it is that they do, a bit better than how they’re doing this now. This includes processes and systems that enable the data scientist to improve their own capabilities. Finally, we have excitement needs, which enable data scientists to do outstanding work — a large part of this is being able to reuse what has been built, through deployment of various kinds.

It is in this context that we can discuss how managers of data science teams can help them be effective.

If there is one kind of behaviour in analytics managers that I wish changed, it is the one I describe in the following lines.

A lot of what data scientists do is experimental, throw-away analysis. However, it is tempting for a number of managers (many of who have made up their minds that some hypothesis holds true, or will work), to assume that they’re right, and what is required from the data scientist is the detailed model that formalizes the relationship.

This kind of assumption makes for poorly designed projects, and doesn’t amply use the data scientist’s time for exploratory analysis, for evaluating the development of different kinds of models, and for finding out what works, given the dataset.

Naturally, given the time-bound nature of businesses and poor understanding of analytics at the executive level in many organizations, such clients are commonplace, and such managers also find themselves in a situation where they push for results without the right underlying systems, data or resources. Sometimes, they begin projects with data scientists who lack the specific skills to build the kinds of models required to solve problems. While this may be the case, the challenge many data scientists in business and consulting have is dealing with such unreasonable expectations.

In this specific context, some questions that shouldn’t be posed to data scientists might be along the following lines:

  • “Assuming that hypothesis X works, how long would it take to build a full fledged application using this hypothesis X?”
  • “The domain experts are convinced that this hypothesis X is true. Why don’t your results reflect this too?”
  • “The values of R_sq or precision/recall I see here don’t reflect what can be done with the data. Aren’t better results possible?”

These kinds of questions are simplistic when in the initial stages of a data science activity/experiment, and in some situations, they could be dangerous too (although they’re innocuous mistakes any manager new to analytics initiatives may make).

For the same reason that “a little knowledge is a dangerous thing” these project managers might be playing with the fortune of the entire analytics program they serve, because they base even large projects on such naive and unverified assumptions. Were they to change their behaviour by giving due consideration to exploratory data analysis, and what the data actually says about viable models and applications that may be built, they might be putting their data scientists and engineers on the path to success.

Pragmatic Business Transformation with AI

I interact with numerous data scientists and people in the data science space on LinkedIn on a daily basis. Many of these have insightful things to say, about how data and artificial intelligence are transforming the business landscape. There is a certain alarmism in the context of the automation of business processes, that accompanies every discussion on artificial intelligence, and with good reason. One of these is Vin Vashishta, whose posts often address pressing challenges in data and AI. Here is a recent post by Vin and my comment. This blog post was originally on Medium, and is an expansion of the ideas represented by the comment.

Traditional Thinking Couches

Traditional thinking about how work gets done, in general has the following elements. Traditional work and time based thinking is based on scientific reductionism and paradigms such as linearity. In truth, this thinking has allowed us to come very far. The division of labour is the very basis of capitalism, for instance, and modern capitalism thrives on specialization and the management of work in this form.

  1. Linearity: The tendency to think of all work as ultimately reducible into linearly scalable chunks. Less of a task requires less resources, whereas more work requires more resources. To be fair, this kind of thinking has been around for millennia, since at least the time of human settlement and the neolithic age.
  2. Reducibility: This is a tendency to think of work as infinitely reducible, in such a way that if we complete each sub-task of a job in a certain sequence, we have the end result of completing the whole job. Systems engineers know better, and understand holism and reductionism in systems as analogies to the traditional view of reducibility and how it might affect the way we see work today
  3. Value-based Work and Tangibility: Another element of what seems to define work traditionally is the presence of tangible objectives, such as items shipped, or certain unambiguously measurable criteria met. In this world, giving a customer a good experience when they shop, or enabling customers or partners to better be served or serve us better, aren’t seen as value, but as non-value-added activities. For a long time, approaches to business transformation focused on the reduction of non-value-add activities from business process, with the view that this will improve process efficiency.

When we think about how businesses will take up AI and machine learning capabilities, we’re compelled to think in terms of the same above lenses. They’re comfortable couches that we cannot get out of, and as a result, possess and dominate our thinking about AI deployment in enterprises.

AI-Specific Cognitive Biases

Some dangers of thinking driven by the above principles are as follows:

  1. Zero-sum automation: The belief that there is a fixed pie of opportunity, and that when we give human jobs to machines, we deprive humans of opportunities. Naturally, this is not true, because general, self-organizing intelligences such as humans are more than capable of discovering and finding new opportunities. Fixed-pie thinking is probably one of the key reasons behind AI alarmism. I would additionally argue that at some level, AI alarmism is also the result of bogeyman thinking, a paradigm in which a strawman such as AI is assigned blame for large scale change. In the past, a lot technological progress and change happened without such bogeymen, even as other changes were being prevented because of such thinking. Another element of bogeyman thinking is the tendency to ignore complementarity, including situations where humans and AI tools could work alongside each other, resulting in higher process effectiveness.
  2. Value bias: While there is truth to the notion that processes have value-add steps and non-value-add steps, it is a feature typical of reductionism to assume that we don’t need the non-value-add steps at all, while they may be serving true purpose. For instance, all manufacturing processes that transform raw material to product have ended up requiring quality checks and assurance. As a feature of the evolution of industrial production processes, quality assurance and control have become part of nearly all manufacturing processes that operate at scale. QA and QC represent a non-linearity in the production system, or a feedback loop which provides downstream process performance information to upstream processes.
  3. Exclusivity: A flip side of bogeyman thinking, combined with value bias, is the phenomenon of exclusivity. For example, the interpretation of emotional expressions on a human face, has for long been a task that humans are great at — for long, we didn’t know of any higher animals, let alone technologies, that had this level of sophistication. Now, there’s a lot going on in the ML/AI space that has to do with the so-called soft aspects of human life — judging people’s expressions and understanding them, learning about their behavioural patterns, etc., and these capabilities are becoming more and more mature within AI systems on a regular basis. This contradicts traditional notions of human-exclusive capabilities in many areas. Naturally, this is seen as a threat, rather than a capability enhancer. The truth is that exclusivity is also to be considered a logical fallacy when discussing the development of AI systems.

It is common for one to fear he who seems to do everything that one can do, until that person becomes one’s friend. I’d say that the word is still out on what AI cannot do yet — and as a result, our approach to business transformation (as with transformation in other areas) should be humans + AI, and not AI in lieu of humans. This synergy is already visible in the manufacturing world, and perhaps we will see it make its way to other spheres as well. Fixed-pie thinking won’t get us anywhere when we have capability amplifiers like AI to assist humans.

Concluding Remarks

A key element of future human productivity is the discovery and exploitation of new opportunities in new frontiers. My suggestion to business leaders thinking about AI adoption for automation and process improvement, is to expand the pie first, by creating new opportunities to do more as a business, and enable your employees to take up and contribute more to your business. When you then enable them with AI, the humans+AI combination you will see as a result will take your organization to new heights.