Some Ideas on Combining Design Thinking and Data Science

Recently, I had the opportunity to finish Stanford SCPD’s XINE 217 “Empathize and Prototype” course, as part of the Stanford Innovation and Entrepreneurship Certificate, which emphasizes the use of design thinking ideas to develop product and solution ideas. It is during this course, that I wrote down a few ideas around the use of data in improving design decisions. Design thinking is a modern approach to system and product design which puts the customers and their interactions at the center of the design process. The design process has been characterized over decades by many scholars and practitioners in diverse ways, but a few aspects are perhaps unchanged. Three of these are as follows:

  1. The essential nature of design processes is to be iterative, and to constantly evolve over time
  2. The design process always oversimplifies a problem – and introduces side effects into the customer-product or customer-process interactions
  3. The design process is only as good as the diversity of ideas we use for “flaring” and “focusing” (which roughly translate to “exploring ideas” and “choosing few out of many ideas” respectively).

Overall, the essential idea conveyed in the design thinking process as explained in XINE 217, is “Empathize and Prototype” – and that phrase conveys a sense of deep customer understanding and focus. Coming to the process of integrating data into the design process – by no means is this idea new, since engineers starting from Genichi Taguchi, and perhaps even engineers a generation before Taguchi, have been developing systems models of processes or products in their designs. These systems models are modeled as factor-response models at some level, because they are converted to prototypes via parameter models and tolerance design processes.

Statistically speaking, these are analogues of the overall designed experiment practice, where a range of parameter variables may be considered as factors to a response, and are together modeled as orthogonal arrays. There’s more detail here.

Although described above in a simplified way, data-driven design approaches, grouped under the broad gamut of “statistical engineering” are used in one or other form to validate designs of mechanical and electrical systems in well-known manufacturing organizations. However, when you look at the design thinking processes in specific ways, the benefits of data science techniques at certain stages become apparent.

The design thinking process could perhaps be summarised as follows:

  1. Observe, empathise and understand the customer’s behaviour or interaction
  2. Develop theories about their behaviour, including those that account for motivations – spoken and unspoken aspects of their behaviour, explicit and implicit needs, and the like
  3. Based on these theories, develop a slew of potential solutions that could address the problem they face (“flare”)
  4. Qualify some of these solutions based on various kinds of criteria (feasibility, scope, technology, cost, to name some) (“focus”)
  5. Arrive at a prototype, which can then be developed into a product idea

While this summary of the design thinking approach may appear very generic and rudimentary, it may be applicable to a wide range of situations, and is therefore worth considering. More involved versions of this same process could take on different levels of detail, whether domain-specific detail, or process-wise rich. They could also add more fine-grained steps, to enable the designer to “flare” and “focus” better. As I’ve discussed in a post on using principles of agility in doing data science, it is also possible to iterate the “focus” and “flare” steps, to get better and better results.

Looking more closely at this five-step process, we can identify some ways in which data science tools or methods may be used in it:

  1. Observing consumer behaviour and interactions, and understanding them, has become a science unto itself, and with the advent of video instrumentation, accelerometers and behavioural analysis, a number of activities in this first step of the design thinking process can be improved, merely by better instrumentation and measurement. I’ve stressed the importance of measurement on this blog before – for one, fewer samples of useful data can be more valuable for building certain kinds of models. The capabilities of new sensors also make it possible to expand the kinds of data collected.
  2. Developing theories of behaviour (hypotheses) may be validated using various Bayesian (or even Frequentist) methods of data science. As more and more data gets collected, our understanding of the consumer’s behaviour can be updated, and Bayesian behavioural models could help us validate such hypotheses as a result.
  3. In steps 3 and 4 of the design thinking process I’ve outlined above, the “focusing and flaring” routine, is at one level, the core experimental design practice described by statistical pioneers including Taguchi. Using some of the tools of data science, such as significance testing, effect size determination and factor-response modeling, we could come up with interesting designs and validate them based on relevant factors.
  4. Finally, the process of prototyping and development would involve a verification and validation step, which tends to be data-intensive. From reliability and durability models (based on Frequentist statistics and PDF/CDF functions), to key life testing and analysis of data in that context, there are numerous tools in the data science toolbox, that could potentially be used to improve the prototyping process.

I realize that a short blog post such as this one is probably too short to explore this broad an intersection between the two domains of design thinking and data science – there’s the added matter of exploring work already done in the space, in research and industry. The intersection of these two spaces lends itself to much discussion, and I will cover related ideas in future posts.

Hypothesis Generation: A Key Data Science Challenge

Data scientists are new age explorers. Their field of exploration is rife with data from various sources. Their methods are mathematics, linear algebra, computational sciences, statistics and data visualisation. Their tools are programming languages, frameworks, libraries and statistical analysis tools. And their rewards are stepping stones, better understanding and insights.

The data science process for many teams starts with data summaries, visualisation and data analysis, and ends with the interpretation of analysis results. However, in today’s world of rapid data science cycles, it is possible to do much more, if we take a hypothesis-centred approach to data science.

Theories for New Age Raconteurs

Data scientists work with data sets small and large, and are tellers of stories. These stories have entities, properties and relationships, all described by data. Their apparatus and methods open up data scientists to opportunities to identify, consolidate and validate hypotheses with data, and use these hypotheses as starting points for our data narratives. Hypothesis generation is a key challenge for data scientists. Hypothesis generation and by extension hypothesis refinement constitute the very purpose of data analysis and data science.

Hypothesis generation for a data scientist can take numerous forms, such as:

  1. They may be interested in the properties of a certain stream of data or a certain measurement. These properties and their default or exceptional values may form a certain hypothesis.
  2. They may be keen on understanding how a certain measure has evolved over time. In trying to understand this evolution of a system’s metric, or a person’s behaviour, they could rely on a mathematical model as a hypothesis.
  3. They could consider the impact of some properties on the states of systems, interactions and people. In trying to understand such relationships between different measures and properties, they could construct machine learning models of different kinds.

Ultimately, the purpose of such hypothesis generation is to simplify some aspect of system behaviour and represent such behaviour in a manner that’s tangible and tractable based on simple, explicable rules. This makes story-telling easier for data scientists when they become new-age raconteurs, straddling data visualisations, dashboards with data summaries and machine learning models.

Developing Nuanced Understanding

The importance of hypothesis generation in data science teams is many fold:

  1. Hypothesis generation allows the team to experiment with theories about the data
  2. Hypothesis generation can allow the team to take a systems-thinking approach to the problem to be solved
  3. Hypothesis generation allows us to build more sophisticated models based on prior hypotheses and understanding

When data science teams approach complex projects, some of them may be wont to diving right into building complex systems based on available resources, libraries and software. By taking a hypothesis-centred view of the data science problem, they could build up complexity and nuanced understanding in a very natural way, and build up hypotheses and ideas in the process.

Data Perspectives: “Orbiting The Giant Hairball”

This may sound weird, but one sure way to not have perspective about the business in an innovative and constantly changing industry is to bury yourself within regular work. This is the meaning of the title – which comes from a book of the same name.

By regular work, I mean work in which you execute tasks with a view to minimize variability and have standard results. This is as opposed to innovative work, which, as Bob Sutton explains in his lectures, is characterised by an increase of variability to the point of failure. Failure and validated learning are essential aspects of the learning experience in any job, to extend a metaphor from Eric Reis’ book The Lean Startup.

Data science and data engineering are the truly cross-functional and cross-industry work areas within the analytics revolution that is under way right now. There are a number of business perspectives that are relevant in one industry, which can also be applied to another. Indeed, work in some industries can anticipate very closely the needs of another.

Data scientists should keep one eye on the business, or to be true to the metaphor here, should occasionally “dive into the hairball” of business and routine work, to get a glimpse of what’s happening in the world of work. The data perspectives that they bring to that conversation will then become as important, as the perspectives they develop due to such experiences. Seasoned professionals and consultants in the data analytics industry may have unconsciously or consciously developed their cross-functional and cross-industry experience over years. But it probably is fitting for younger data professionals – and there are many of them out there – to occasionally “dive into the hairball from orbit” and understand the challenges of data for those in various walks of business.