Some Ideas on Combining Design Thinking and Data Science

Recently, I had the opportunity to finish Stanford SCPD’s XINE 217 “Empathize and Prototype” course, as part of the Stanford Innovation and Entrepreneurship Certificate, which emphasizes the use of design thinking ideas to develop product and solution ideas. It is during this course, that I wrote down a few ideas around the use of data in improving design decisions. Design thinking is a modern approach to system and product design which puts the customers and their interactions at the center of the design process. The design process has been characterized over decades by many scholars and practitioners in diverse ways, but a few aspects are perhaps unchanged. Three of these are as follows:

  1. The essential nature of design processes is to be iterative, and to constantly evolve over time
  2. The design process always oversimplifies a problem – and introduces side effects into the customer-product or customer-process interactions
  3. The design process is only as good as the diversity of ideas we use for “flaring” and “focusing” (which roughly translate to “exploring ideas” and “choosing few out of many ideas” respectively).

Overall, the essential idea conveyed in the design thinking process as explained in XINE 217, is “Empathize and Prototype” – and that phrase conveys a sense of deep customer understanding and focus. Coming to the process of integrating data into the design process – by no means is this idea new, since engineers starting from Genichi Taguchi, and perhaps even engineers a generation before Taguchi, have been developing systems models of processes or products in their designs. These systems models are modeled as factor-response models at some level, because they are converted to prototypes via parameter models and tolerance design processes.

Statistically speaking, these are analogues of the overall designed experiment practice, where a range of parameter variables may be considered as factors to a response, and are together modeled as orthogonal arrays. There’s more detail here.

Although described above in a simplified way, data-driven design approaches, grouped under the broad gamut of “statistical engineering” are used in one or other form to validate designs of mechanical and electrical systems in well-known manufacturing organizations. However, when you look at the design thinking processes in specific ways, the benefits of data science techniques at certain stages become apparent.

The design thinking process could perhaps be summarised as follows:

  1. Observe, empathise and understand the customer’s behaviour or interaction
  2. Develop theories about their behaviour, including those that account for motivations – spoken and unspoken aspects of their behaviour, explicit and implicit needs, and the like
  3. Based on these theories, develop a slew of potential solutions that could address the problem they face (“flare”)
  4. Qualify some of these solutions based on various kinds of criteria (feasibility, scope, technology, cost, to name some) (“focus”)
  5. Arrive at a prototype, which can then be developed into a product idea

While this summary of the design thinking approach may appear very generic and rudimentary, it may be applicable to a wide range of situations, and is therefore worth considering. More involved versions of this same process could take on different levels of detail, whether domain-specific detail, or process-wise rich. They could also add more fine-grained steps, to enable the designer to “flare” and “focus” better. As I’ve discussed in a post on using principles of agility in doing data science, it is also possible to iterate the “focus” and “flare” steps, to get better and better results.

Looking more closely at this five-step process, we can identify some ways in which data science tools or methods may be used in it:

  1. Observing consumer behaviour and interactions, and understanding them, has become a science unto itself, and with the advent of video instrumentation, accelerometers and behavioural analysis, a number of activities in this first step of the design thinking process can be improved, merely by better instrumentation and measurement. I’ve stressed the importance of measurement on this blog before – for one, fewer samples of useful data can be more valuable for building certain kinds of models. The capabilities of new sensors also make it possible to expand the kinds of data collected.
  2. Developing theories of behaviour (hypotheses) may be validated using various Bayesian (or even Frequentist) methods of data science. As more and more data gets collected, our understanding of the consumer’s behaviour can be updated, and Bayesian behavioural models could help us validate such hypotheses as a result.
  3. In steps 3 and 4 of the design thinking process I’ve outlined above, the “focusing and flaring” routine, is at one level, the core experimental design practice described by statistical pioneers including Taguchi. Using some of the tools of data science, such as significance testing, effect size determination and factor-response modeling, we could come up with interesting designs and validate them based on relevant factors.
  4. Finally, the process of prototyping and development would involve a verification and validation step, which tends to be data-intensive. From reliability and durability models (based on Frequentist statistics and PDF/CDF functions), to key life testing and analysis of data in that context, there are numerous tools in the data science toolbox, that could potentially be used to improve the prototyping process.

I realize that a short blog post such as this one is probably too short to explore this broad an intersection between the two domains of design thinking and data science – there’s the added matter of exploring work already done in the space, in research and industry. The intersection of these two spaces lends itself to much discussion, and I will cover related ideas in future posts.

Pervasive Trends in Big Data and Data Science

As of mid-2017, I’ve spent almost two years in the big data analytics and data science world, coming from 13 years of diverse work experience in engineering and management prior. Starting from a professional curiosity, it has taken me a while to develop some data science and engineering skills and hone key skills among these as a data scientist. Along the way, I’ve had a chance to learn core software development methods and principles, stay in touch with the latest in the field, challenge my existing knowledge of product development methodologies and processes, and learn more about data analysis, statistics and machine learning than I started out with in 2015. Along with the constant learning, I’ve had a chance to observe a few pervasive trends in the big data and analytics worlds, which I wish to share here.

  1. Cloud infrastructure penetration: Undoubtedly the biggest beneficiaries of the data and analytics revolution have been cloud service providers. They’re also stretched thin, with reducing costs, massive competition, and the need for value added services of various kinds (big compute and API support, along with big storage, for instance) to be available alongside the core cloud offerings that companies are lapping up, for their data management needs. Security concerns continue to exist, and one of the biggest security issues was actually from the US’ leading cloud service provider, Amazon Web Services. Despite this, many industries, even those that consider data security paramount, wish to adopt cloud infrastructures, because of the reduced cost of operation and the scalability inherent in cloud platforms.
  2. Deep learning adoption: Generalized learning algorithms based on neural networks have taken the machine learning world by storm, and given the proliferation of big compute and big data storage platforms, it has become easier to train deep learning algorithms than in the past. Extant frameworks continue to give better, more user-friendly algorithms as they evolve, and there’s definitely a more user-friendly ecosystem of frameworks and algorithms out there, such as Caffe, Keras, and Tensorflow (which has become more user-friendly and better integrated with numerous systems programming languages and frameworks). This trend will continue, with several tested and published DL APIs available for integration into application software of various kinds.
  3. API based data product deployment: Data science operationalization has begun to happen through APIs and platforms. Organizations that are developing data product strategies are increasingly considering platform views and integrating APIs for managing data, or for scoring incoming data based on machine learning models. With the availability of such APIs for general use, it has become possible to integrate many such microservice APIs to build data products with very specific and diverse capabilities.
  4. A focus on value from data: Companies are looking past the big data hype more often these days, and are looking at what value they can get from the data. They’re focusing on better data collection and measurement processes, improved instrumentation and qualifying their data management infrastructure investments. They’re also seeking to enable their data science teams with the right approaches, tools and methods, so that they can get from data to insight faster. Several startups are also doing pioneering work in governing the data science process, by integrating principles of agility, continuous integration and continuous deployment into software solutions developed by data science teams.
  5. Automated data science and machine learning: Finally (and in many ways, most importantly), automated data science and machine learning is a relatively new area of work which is gaining ground significantly. Numerous startups and established organizations are evaluating methods to automate key parts of the data science workflow itself, with The Data Team among them. Such automation of data science is a trend that I foresee will gain ground for some more time, before it becomes an integral part of many data science workflows, and development approaches. While a number of applications that straddle this space are referred to as AI, the word is out on what AI is, and what isn’t, as far as me and many of my colleagues are concerned.

These are just some of the trends I’ve observed, of course, and from where you are, as a data scientist, you may be seeing much more. One thing is for sure – those who continue to keep their knowledge and skills relevant in this fast-changing space will continue to be rewarded with interesting work and new opportunity.