Lessons from Agile in Data Science

Over the past year and a few months, I’ve had a chance to lead a few different data science teams working on different kinds of hypotheses. The engineering process view that the so-called agile methodologies bring to data science teams is something that has been written about. However, one’s own experiences tend to be different, especially when it comes to the process aspects of engineering and solution development.

Agile principles are by no means new to the technology industry. Numerous technology companies have attempted to use principles of agility in their engineering and product development practices, since a lot of technology product development (whether software or hardware, or both) is systems engineering and systems building. While some have found success in these endeavours, many organizations still find agility a hard objective to accomplish. Managing requirements, the needs of engineering teams and concerns such as delivery, quality and productivity for scalable data science are a similarly hard task. Organizational structure, team competence, communication channels and approaches, leadership styles and culture all play significant roles in the success of product development programmes, especially those centred around agility.

In the specific context of software and systems development, two talks stand out in my mind. One is from a thought leader and an industry pioneer who helped formulate the agile manifesto (a term which he extensively derides, actually) – and the other is from a team at Microsoft, which is a success story in agile product development.

Here’s Pragmatic Dave (Dave Thomas, one of the original pioneers of agile software development), in his GOTO 2015 talk titled “Agile is Dead”.

I’m wary of both extreme proponents and extreme detractors of a philosophy or an idea, especially when in practice or use, it seems to have some success in some quarters. While Dave Thomas seems to take some extreme views, he does bring in a lot of pragmatic advice. His views on the “manifesto for agility” are in some sense more helpful than boiler plate Agile training programmes, especially when seen in the context of Agile software/system development.

The second talk that I mentioned, the one featuring Microsoft Scrum masters, is very much a success story. It has all the hallmarks of an organization navigating through what works and what doesn’t, and trying to find their velocity, their rhythm and their approach, from what is a normative approach that’s suggested in so many agile software development textbooks and by many gurus and self-proclaimed experts.

This talk by Aaron Bjork was actually quite instructive for me when I first saw it a few months ago. Specifically, the focus of agile practices on teams and interactions, rather than on “process” was instructive. Naturally, this approach has other questions around it, such as scaling, but in the specific context of data science, I find that the interactions, and the process of generating hypotheses and evaluating them, seems to matter more than most things. These are only two of the many videos and podcasts I listened to, and surely they constitute only a portion of the interactions I’ve had with team members and managers on Agile processes for data science delivery.

It is in this setting that my personal experiences with Agile were initially less than fruitful. The team struggled to follow both process and do data science, and the management overhead, with activity and task management was extensive. This problem still remains, and there doesn’t seem a clear solution to balancing the ceremony/rituals of agile practices and seemingly useless ideas such as story points. Hours are more useful than story points – so much so that scrum practitioners typically devolve from equating story points to hours or multiples of them, at some point. The issue here lies squarely with how the practices have been written about and evangelized, rather than the fundamental idea itself.

There’s also the issue of process versus practice – in my view, one of the key things about project management of any kind. The divergence between process and practice in Agile methods is very high – and in my opinion, the systems/software development world deserves better. Perhaps one key reason for this is the proliferation of Scrum as the de-facto Agile development approach. When Agile methods were being discussed and debated, the term “Agile development” used to represent a range of different approaches, which has given way (rather unfortunately) to one predominant approach, Scrum. There is an analogy in the quality management world that I am extensively familiar with – in Six Sigma and the proliferation of DMAIC almost exclusively to solve “common cause” problems.

Process-v-practice apart, there are other significant challenges within using Agile development for data science. Changing toolsets, the tendency to “build now and fix later” (although this is addressed through effective continuous deployment methods) and process overhead constitute some reasons why this approach may still be attractive.

What does work universally, is the sprint-based approach to data science. While the sprint-based approach is only one element of the overall Scrum workflows we see in the industry, it can, in itself, become a powerful, iterative way to think about data science delivery in organizations. Combined with a task-level structure and a hypothesis model, it may be all that your data science team requires for even complex data science. Keeping things simple process-wise, may unlock the creative juices of data scientists and enable your team to favour direct interactions over structured interactions, enabling them to explore more, and extract more value from the data.

The Expert System Anachronism in the Data Science and AI Divergence

Although the data science and big data buzzwords have been bandied about for years now, and although artificial intelligence has been talked about for decades, the two fields are irrevocably inter-related and interdependent.

For one thing, the wide interest in data science started just as we were beginning to leverage distribute data storage and computation technologies – which allowed companies to “scale out” storage and computation, rather than “scale up” computation. Companies who could therefore buy numerous run-of-the-mill computers (rather than extremely expensive, high end computers, in smaller numbers) could potentially leverage their data collection activities to be useful to the enterprise.

Let’s not forget, though, that the point of such exercises was to actually get some business value at the end of such an exercise. There’s virtually no business case for collecting huge amounts of data and storing them (with or without structure), if we don’t have a plan to somehow utilize that data for taking business decisions better, or to somehow impact the business or customers positively. IT managers across industries have therefore struggled to make sense of the big data space, and how much to invest, what to invest in, and how to make sense of it all.

Technology companies are only too happy to sell companies the latest and greatest data science and data management frameworks and solutions, but how can companies actually use these solutions and tools to make a difference to their business? This challenge for executives isn’t going away with the advent of AI.

Artificial Intelligence (AI) has a long and hoary history, and has been the subject of debate, discussion and chronicle over several decades. Geoff Hinton, the AI pioneer, has a pretty comprehensive description of various historical aspects of AI here. Starting from Geoff Hinton’s research, pioneering research in recent years by Yann Le Cun, Andrej Karpathy and others has enabled AI to be considered seriously by organizations as a force multiplier, just as they considered data science a force multiplier for decision making activities. The focus of all these researchers are on general purpose machine intelligence, specifically neural networks. While the “deep learning” buzzword has caught on of late, this is fundamentally no different from a complex neural network and what it can do.

That said, AI in the form of deep learning differs vastly in capability from the algorithms data scientists and data mining engineers have used for more than a decade, now. By adding many layers, and by constructing complex topologies in these neural networks, and by iteratively training them on large amounts of data, we’ve progressed along multiple quantitative axes (complexity, number of layers, amount of training data, etc) in the AI world, to get not merely quantitative, but qualitatively better in terms of AI performance. Recent studies at Google show that image captioning, often considered a hard problem for AI, is now at near-human levels of accuracy. Microsoft famously announced that their speech-to-text and translation engines stand improved by an order of magnitude, because of the use of these techniques.

It is this vastly improved capability of AI, and the elimination of the human (present forever in the data science activity loop) from even the analysis and design of these neural networks (generative adversarial networks being a case in point), that makes the divergence between Data Science and AI very vivid and distinct. AI seems to be headed in the direction of general intelligence, whereas data science approaches and methods constituted human-in-loop approaches to making sense of the data. The key value addition of the human in this data science context was “domain” – and I have extensively discussed the importance of domain in data science in an earlier post – but this too, has increasingly become supplanted by efficient AI, provided that the data collection process for training data, and the training and topological aspects of the networks (known as hyper parameters) are well defined enough. This supplanting of the human domain perspective, by machine-learned domain features that matter, is precisely what will enable AI to develop and become a key force to reckon with, in industry.

Therefore I venture that the “anachronism” in the title of this post, is the domain-based model of systems, or intelligent systems, called the Expert System. Expert system design is an old problem that probably had its heyday and apparently disappeared into the mist of technological obsolescence – and it is this kind of expert system design problem that AI methods will be so good at solving, to the point that they can replace humans in key tasks, and become a true general intelligence. Expert systems were how the earliest AI researchers imagined machine intelligence to be useful to humanity. However, their understanding was limited to rule-based expert systems. While the overall idea of the expert system is still relevant in many domains – so much so that in a sense, we have expert systems all around us – it is undeniable that the advent of AI will enable expert systems to develop and evolve once again, but without the rule-based approaches we have seen in the past, and with inductive learning as is apparent from deep learning and machine learning methods.