Achieving Explainability and Simplicity in Data Science Work

This post stems from a few of the tweets I’d authored recently (Over at @rexplorations) on deep learning, data science, and the other skills that data scientists ought to learn. Naturally, this is by no means a short list of skills, given the increasingly pivotal role that data scientists play in organizations.

Here’s a summary of the tweet-stream I’d put out, with some additional ponderings.

  1. Domain knowledge is ignored on the data science road to perdition. Doing data analysis, or building models from data without understanding the domain and the relevance of the data and factors one is using for these models, is akin to “data science suicide”. It is a sure shot road to perdition as a data scientist. Domain knowledge is also hard to acquire for data scientists, especially those working on projects as consultants, and applying their skills in a consultative, short-term setting. For instance, I have more than a decade of experience in the manufacturing industry, and I still find myself learning new things when I encounter a new engineering set up or a new firm. A data scientist is nobody if not capable of learning new things – and domain knowledge is something that they need to constantly skill up on, in addition to their analytical skills.
  2. Get coached on your communication skills, if needed. When interacting with domain experts and subject matter experts, communication skills are extremely important for data scientists. I have frequently seen data scientists suffer from the “impostor syndrome” – not only in the context of data analysis methods and techniques, but also in the context of domain understanding.
  3. Empathise, and take notes when speaking to subject matter experts. It is for this reason that the following things are extremely important for new data scientists interacting with subject matter experts:
    1. Humility about one’s own knowledge of a specific industry area,
    2. An ability to empathise with the problems of different stakeholders
    3. The ability to take notes, including but not limited to mind maps, to organize ideas and thoughts in data science projects
  4. Strive for the usefulness of models, not to build more complex models. Data scientists ignore hypotheses that come from such discussions at their own peril. Hypotheses form the lifeblood of useful data science and analysis. As George E. P. Box said, “All models are wrong, some models are useful” – and this couldn’t be more true than when dealing with models built from hypotheses. It is such models that become really useful.
  5. Simpler models are easier to manage in a data ethics context. In product companies that use machine learning and data science to add value to customers, a debate constantly exists on the effective and ethical use of customer data. While having more data at one’s disposal is helpful for building lots of features, callous use of customer data can present a huge risk. Simpler models are easier to explain – and are arrived at when we accumulate sufficient domain knowledge, and test enough hypotheses. With simpler models, it is easier to explain what data to collect, and this can also help win the customer’s trust.
  6. Careful feature engineering done with human supervision and care may be more effective and scrupulous than automated feature engineering. We live in a world where AutoML and RoboticDataScience are often discussed in the context of machine intelligence and speeding up the process of insight generation from data. However, for some applications, it may be a better idea in the short term to ensure that the feature engineering happens through human hands. Such careful feature engineering may give organizations that use sensitive data a leg up as a longer term strategy, by erring on the side of caution.
  7. Deep learning isn’t the end of the road for data scientists. Deep learning (justifiably) has seen a great deal of hype in the recent past. However, it cannot be seen as a panacea to all data analysis. The end goal from data is the generation of value – be it for a customer, or for society at large. There are many ways to do this – and deep learning is just one approach.

I’m not discussing the many technical aspects of building explainable models. These technical aspects are contextual and depend on the situation, for one, and additionally, the tone of the post and tweets are lighter, to encourage a discussion and to welcome beginner data scientists to this discussion. Hence my omission of these (important) topics.

If you like something on this post, or want to share any other related insights, do drop a comment, or tweet to me at @rexplorations or message me at LinkedIn.