A View of DevOps from the World of Data Science

Operations management as a discipline has taken many shapes and forms in different industries over the years, but there is perhaps something unique that is discussed in software development operations, commonly referred to as DevOps. Many of these considerations around DevOps also apply to the related and increasingly interesting subset of problems that is MLOps, which is the field of Machine Learning Operations. So, what is unique about DevOps and the discussions of software development operations in this context?

Perceptions of DevOps Today And Contrasts with Traditional Industry Operations

One of the tweets I came across recently by a manager hiring for DevOps roles was this one, that sparked an outpouring of ideas from me. The entire thread is below, with the original tweet for context.

Popular understanding of DevOps seems to revolve around tools. Tools for managing code, workflows, and applications for helping with this or the other thing encountered in the context of software development workflows. Strangely enough, operations management in industries that are more established, such as the manufacturing industry, oil and gas or energy industry, or telecommunications tend to have the following sets of considerations:

  1. People considerations: From the hiring and onboarding of talent for the organization, to the development of these as productive employees, to employee exit. Operational challenges here may be the development of role definitions, establishing the right hierarchy or interactions for smooth operations, and ensuring that the right talent is attracted and retained in the organization.
  2. Process considerations:  All considerations spanning the actual process of value delivery, whereby the resources available to the organization are put to use to efficiently solve day-to-day problems and meeting customer requirements on an ongoing basis. Some elements of innovation and continual improvement would also fall into the ambit of the process management that’s part of Operations.
  3. Technology considerations: All considerations spanning the application of various kinds of technology ranging from the established and mundane, to the innovative and novel – all of these could be considered a part of technology management within Operations in traditional organizations.

Anyone familiar with typical, product-centric or services-oriented software development organizations will observe that the above three considerations are spread out among other supporting functions of these organizations. Perhaps technically centred organizations with very specific engineering and development functions evolve this way, and perhaps there is research to show for this hypothesis. However, the fact remains that what is considered development operations doesn’t normally involve the hiring and development of talent for product/solution engineering or development, or the considerations around the specific technologies used and managed by the software developers. These elements seem to be subsumed by human resources and architects  respectively.

Indeed, the diversification of roles in software development teams is so prolific that delivery managers (of the so-called Scrum teams) are rarely in charge of the development operations process. They’re usually owners for specific solution deliverables. The DevOps function has come to be seen as a combination of software development and tooling roles, with an emphasis on continuous delivery and code management. This isn’t necessarily a bad thing, and there is a need for such capablities  – arguably there is a need for specialists in these areas as well. But here’s the challenge many managers hiring mid-senior professionals for managing DevOps:

Cross Functional DevOps and Lean in Manufacturing Operations

When we have DevOps engineers and managers only interested in setting up pipelines for writing and managing code, rather than thinking holistically about how value is being delivered, and whether it is, we miss crucial opportunities for continuous improvement.

As someone who has worked in both manufacturing product development and software product development teams, I find that there needs to be a greater emphasis in software development organizations on cross-functional thinking, and cross-functional problem solving. While a lot of issues faced by developers and engineers in the context of product or solution development are solved by technical know-how and technical excellence, there are broader organizational considerations that fit into the people, process and technology focus areas, that are important to consider – and without such considerations, wise decisions cannot be taken. A lot of these decisions have to do with managing waste in processes – whether that is wasted effort, time or creativity, or technical debt we build up over time, or redundancy for that matter. The Lean toolbox, which originated from the manufacturing industry, provides us a ready reckoner for this, titled the “eight wastes in processes”: inventory, unused creativity, waiting, excess motion, transportation, overproduction, defects and overprocessing. Short of seeing all development activities through these “waste lenses”, we can use them as general guidelines for keenly observing the interactions between a developer, his tools, other developers, and code. Studying these interactions could yield numerous benefits, and perhaps such serious studies are common in some large enterprise DevOps contexts, but at least in the contexts I’ve seen, there’s rarely discussions of this nature with nuance and deep observation of processes.

In fact, manufacturing organizations see Lean in a fundamentally different way from how software development teams see it.

Manufacturing organizations heavily emphasize process mapping, process observations and process walks. And I shouldn’t paint all manufacturing organizations by the same brush, because indeed, the good and the bad ones in this respect are like chalk and cheese – they’re poles apart in how well they understand and deploy efficient operational processes through Lean thinking. Many may claim to be doing Six Sigma and structured innovation, and in many cases, such claims don’t hold water because they’re using tools to do their thinking.

Which brings me to one of the main problems with DevOps as it is done in the software development world today – the tools have become substitutes for thinking, for many, many teams. A lot of teams don’t evaluate the process of development critically – after all, software development may be a team sport, but in a weird way, software developers can be sensitive to replay and criticism of their development approaches. This is reminiscent of artisans in the days before mass production, and how they developed and practised an art in their day to day trade. It is less similar to what’s happening in large scale car or even bottle manufacturing plants around the world. Perhaps there are good reasons for this too, like the development of complexity and the need for specialization for building complex systems such as software applications, which are built but once, but shipped innumerable times. All this still doesn’t imply, however, that tools can become substitutes for thinking about processes and code – there are many conversations in that ambit that could be valuable, eye-opening elements of any analysis of software development practices.

MLOps: What it Ought to Include

Now I’ll address machine learning operations (MLOps) which is a modern cousin of DevOps, relevant in the context of machine learning models being developed and deployed (generally as some kind of software service). MLOps have come to evolve in much the same we saw DevOps evolving, but there is a set of issues here that go beyond the software-level technicalities, to the statistical and mathematical technicalities of building and deploying machine learning systems.

MLOps workflows and lifecycles appear similar to software development workflows as executed in DevOps contexts. However, there ought to be (and are) crucial differences in how these workflows are different between these two disciplines (of software engineering and machine learning engineering).

Some of the unique technicalities for MLOps include:

  1. Model’s absolute performance, measured by metrics such as RMSE or F1 score
  2. Model deployment performance against SLAs such as latency, load and scalability
  3. Model training and retraining performance, and scalability in that context
  4. Model explainability and interpretability
  5. Security elements – data and otherwise, of the model, which is a highly domain-dependent conversation

In addition to these purely technical elements of MLOps, there are elements of the discipline in my mind, that should include people and processes:

  1. Do we have engineers with the right skills to build and deploy these models?
  2. Have we got statisticians who can evaluate the underlying assumptions of these ML models and their formulation?
  3. Do we have communication processes in the team that ensure timely implementation of specific ML model features?
  4. How do we address model drift and retraining?
  5. If new training data comes from a different region, can it be subject to the same security, operational and other considerations?

There may be more, and some of you reading this, who happen to have deployed and faced production scale ML model development/deployment challenges, may have more to add. MLOps should therefore see significant discussions around these elements, and these and other related discussions should happen early and often, in the context of ML model deployment and maintenance.

“Small Data”and Being Data-Driven

Being data-driven in organizations is a bigger challenge than it is made out to be. For managers to suspend judgement and make decisions that are informed by facts and data is hard, even in this age of Big Data. I was spurred by a set of tweets I posted, to think through this subject.

Decision Making Culture

A lot of organizations have jumped into the Big Data era having bypassed widespread use of data-driven decision making in their management ranks altogether. And this is, for many organizations, an inconvenient truth. In many organizations, even well known ones, experienced managers often made decision on gut feeling or based on reasons other than data that they collected. Analytics and business intelligence hoped to change that, and in some ways, it has. Many organizations and managers have changed their work styles. Examples abound of companies adopting techniques like Six Sigma in the 1980s and 1990s, a trend that continues to this day in the manufacturing industry.

Three Contrasts

With the explosion in technologies and methods that have enabled Big Data to be collected and stored as “data lakes” and for data to be collected in real time as streaming data using technologies like Spark and NiFi, we’re at the advent of a new era of decision making characterised by the  3 Vs of Big Data, and data science at scale.

To see three contrasts between old and new management decision making styles:

  1. Spending and buying decisions (for resources, infrastructure, technology and projects) are made after competitive evaluation based on data now more than ever. In the past, the lack of communication and analysis engines, and limited globalization enabled managers to spend less time evaluating even critical decisions, because the options were limited. Spending and buying decisions make up a lot of the executive decision making and a lot of it is informed by small data. The new trends of connected economies to networks, data mining and data analysis is bound to impact this positively. A flood of information enabled by the digital age exposed them to possibilities but without the tools to do better at such competitive analysis. The advent of advanced analytics will upend this paradigm, and will result in a better visibility for decision alternatives.
  2. Operational excellence decisions are based more on real-time data now more than ever. Operational excellence and process efficiency is a key focus area for many manufacturing organizations, and increasingly concerns service oriented organizations as well. While “small data” were being collected at regular intervals, to get a sense of the business operations, these were not fully effective in capturing the wide range of process modes and didn’t represent the full possibilities one could leverage with such data. The number of practitioners of advanced methods, who used such methods in a verifiable way, were also limited and rarely formed the management strata or informed them. The proliferation of the new classes of data scientists and data engineers will affect the way decisions will be taken in future, in addition to the advent of real-time analytics.
  3. Small data as a stepping stone to Big Data. Small Data, which is data collected as samples that may be slices of sensor information or representative samples of population data (such as Big Data), may increasingly be used to formulate the “cultural business case” for doing Big Data in companies. Many companies that do not have the culture of data driven decision making in their managerial ranks, are experimenting on a grand scale, with Big Data. Such organizations have taken to Big Data technologies such as Hadoop and Spark, and are collecting more data than they usefully analyze, often times. There is definitely scope to evaluate the business value with such implementations. There is also an opportunity to improve the cost effectiveness of the data science initiatives in companies, by evaluating the real need for a Big Data implementation, by using “small data” – data that does not have the same volume, velocity, variety and veracity criteria that what’s now accepted to be Big Data does have.

Data Driven Decision Making Behaviours

Decision making is strongly influenced by behaviours. Daniel Kahnemann’s book Thinking Fast and Slow provides a psychological framework for thinking about fast and slow decision making, the former being gut-driven, and the latter being driven by careful, plodding analysis. Humans have the tendency to decisiveness, especially in organizations, and executives are often rewarded for fast decision making that is also effective. Naturally, this means that decision making as a habit flourishes in organizations.

Such fast decision making, however, comes at a price. A lot of decisions that aren’t well thought-through, could influence a large organization’s functioning, because the decision could be fundamental to the organization and may be relevant to all employees. Some organizations do reward behaviours in their managerial cadres that facilitate looking at the data that supports decisions. However, the vast majority of managers have a tax on the time they spend on decisions and would be rewarded for acting quickly and influencing a wide ranging array of decisions instead.

Enabling fast decision making has obvious benefits in a market economy. The more time managers spend in decision making, or delay a decision, the less competitive companies tend to look. Data driven decision making can be enabled by providing access to data, in a quick and painless way. And this means building intelligence into our interfaces, and into the machines that help us make and record decisions. It also means being able to delegate the mundane tasks well and easily.

Concluding Remarks

A lot of organizations that have Big Data initiatives may not have the appropriate management or decision making culture that can fully utilize the investment in Big Data, which can sometimes be considerable. By using “Small Data” and the insights from analysis of such data, there is an opportunity to invest less and build the behaviours and organizational systems and habits that will make a Big Data implementation effective.