Emphasizing the Basics: Structured Data Science Mentoring

Data science, machine learning and AI are constantly growing and burgeoning fields, with research that’s spilling over at the seams in terms of the sheer volume of it all. Every day, I receive numerous references to interesting papers on my Twitter feed, thanks to Arxiv daily and such accounts there. I also see papers explained with code, and references to ML products and systems in numerous contexts. This is all overwhelming beyond a point for a professional who doesn’t have a specific focus area. Speaking pragmatically, and from the tree of knowledge (which is always bound to be vast), it is a feature of every single human endeavour to exhibit this kind of complexity as we spend more and more time exploring things, farming ideas and understanding new possibilities in these areas.

Data scientists are going to be at different levels of competence and may be differently placed to take on challenges they are asked to face – the role of the mentor (regardless of the type) is to systematically challenge the data scientist to discover new innate potential and develop such potential to increase their overall capabilities and effectiveness.

The Data Science Mentoring Challenge

Mentoring can be a hard task for this reason – a lot of people (understandably) gravitate towards complex models that are meant for specific purposes, without fully understanding the details and the exact mechanisms behind simpler machine learning and statistical modeling methods. The problem with this is two fold – a) reliance on libraries and frameworks with implementations that already exist, and b) inability to characterize, apply and explain common and simpler techniques to actual real world problem statements. Part of the problem here is the sensationalization of research in the media. Open research without borders is important and pivotal for speedy progress in technology areas like ML. But we’re also seeing a lot of misinformation including sensationalization of advanced ML techniques and when some of gets parroted by professionals (some of who may become hiring managers) we see the problem proliferating into the world of work as well. I’ve interviewed my fair share of individuals who understand, say, an LSTM unit’s different gates but aren’t comfortable explaining autocorrelation techniques or ARMA models. This gap probably stems from gaps in mentoring and coaching, which ideally should emphasize basics first.

I’d posit that the role of the mentor has changed, in data science, over the last several years, and I would say it has changed most significantly in the last two years. In the future, data and AI mentoring will look different from what it looked like in the past five years. This is because the nature of the job of a data scientist (or alternatively an AI/ML engineer) has also changed. Despite developments in Automated Machine Learning, we’re inundated with situations in the real world, where we require human expertise to get through data science and machine learning problems. This human expertise manifests in three processes: problem characterisation, problem formulation, and problem solving. We need real, human data scientists (not just an AutoML tool) to look beyond the obvious automations such as hyperparameter or architectural searches, to reason about the nuts and bolts of problems, interpret the problem domain and reason about different kinds of hypotheses and how they make sense.

This makes the process of mentoring for data science different than it was, in certain specific ways. For one thing, mentors today create the field of problems or opportunities that will exist tomorrow. Data scientists today experience an overload of information as can be expected, from different sources. From Arxiv and Springer papers and articles, to new research and code, new books and new frameworks and algorithms, there are plenty of things to learn on a daily basis. However, the broader skill set of the data scientist even today can be characterized into four key areas: basic, business, functional and frontier skills.

Broad Characterization of Skills for Modern Data Science
  1. Technical skills: There’s the need for a strong foundation that enables general effectiveness in a data science role. This includes good skills across statistical analysis fundamentals, leading into the key principles that enable statistical learning models to be built, and a sound understanding of the mechanism behind common algorithms such as regression, tree algorithms, search and optimization methods
  2. Business skills: There is a strong need for data scientists who can reason about business processes and systems, and understand how data may be generated, how it may flow, and what insights may be required of it. Not only is this is a key skill to have fruitful interactions with clients and stakeholders, but it is also important to narrow down to the right level of depth for the job in terms of satisfaction and effectiveness.
  3. Functional skills: There’s the need for effectiveness on the job, at a functional level, which not only includes technical competence at the statistical, mathematical and code levels, but at the level of processes and good practices such as clean code, change management and reproducible research. One could also see more advanced machine learning and feature engineering techniques as being part of the functional skill set.
  4. Frontier skills: There’s research that’s expanding at multiple frontiers, which is hard for even experienced data scientists to keep up with, if they’re really interested in furthering their career beyond the obvious and evident challenges of day-to-day work.

Mentors: Different Levels

The role of mentorship has also become specialized in the last two years, which is, in my view, one of the changes most representative of the maturation of the field of data science. Mentors today can be at different levels of skill and still add value to different kinds of data science and analytics roles. For the sake of this discussion, I’d classify mentors today into two kinds – the “breadth mentor” and the “depth mentor”. While both kinds of mentors possess certain common skills, especially on the interpersonal communication front, they may have different approaches to technical, functional and research level mentoring.

The breadth mentor is an individual with plenty of experience in data science, perhaps in a consulting setting, that can provide generally correct advice to data scientists with the development of broad skill sets, ranging from basic statistical analysis, to advanced algorithms. The nature of the mentoring here is on developing a well-rounded data scientist, rather than an expert in a specific field.

The depth mentor by contrast, is someone who has deep experience in a specific area of industry or technology and has deep experience in bringing this field together with data science. Examples of this kind of data scientist would be an NLP researcher, or a researcher in the field of robotics, both of who may be expert practitioners of data science methods in their specific areas, but without the broader knowledge of consultative data science methods.

Depending on the needs of the business and the data scientists in question, the appropriate kind of mentor has to be chosen – and this shouldn’t be done lightly. For example, bringing a breadth mentor to an AI product firm may have some advantages, but if the firm is solving problems in a specific space, this may not work out so well. Similarly, bringing a depth mentor to a consulting firm can help grow a specific practice (or a new one) but may not benefit the broader data science efforts across different business domains there.

Structuring Mentorship in Data Science

Mentors (and hiring managers) in general should emphasize the importance of the basic skills listed above. In my view, when a data science candidate has the correct understanding of the essential basic statistical ideas and common algorithms, it becomes a lot easier for them to grasp more advanced ideas such as in deep learning, when this is required. Mentors can build better basic skills in data scientists by challenging their technical acumen.

Mentors should also emphasize business skills where relevant, and where the emphasis is on research, they should emphasize some of the frontier skills as well. Mentors in this context are expected to challenge the data scientist with relevant questions, and encourage a habit of systematically breaking down problems and asking the right questions. These business skills are important all the way up to solution architect roles and management, when crucial decisions have to be taken and hard questions will need to be asked often. Mentors can build better business skills in data scientists by challenging their problem understanding and characterisation.

Functional skills are important for effectiveness on the job. It is not okay for data scientists to theoretically understand a specific subject area, only to find themselves handicapped when asked to build a machine learning pipeline. Therefore functional skill mentoring is about challenging the data scientist on problem solving effectiveness.

Finally, frontier skills development depends on both the organizational or research context, and the data scientist’s interests. Mentors can provide helpful markers to enable the exploration of ideas, while emphasizing value from the research, and asking questions that keeps the data science researcher on track. The challenge the mentor can pose here is differentiated solution value and originality.

The Importance of Emphasizing the Basics

This brings me to the importance of emphasizing the basics. I see numerous individuals out there who are getting into data science and machine learning that are interested in getting right to the latest and greatest algorithms. For a while – and this has been a trend on LinkedIn and Twitter – budding data science aspirants post some of their work, where it involves the development of simple scripts or programs around computer vision, translation and such problem statements, thereby delivering an impression to a lot of their audience, that not only are they skilled at those techniques demonstrated, but that they are skilled at different kinds of data science problem statements as well. My own suggestion to data science aspirants is that they will be under pressure to demonstrate some of their more involved skills, not merely the ability to use pre-built libraries to solve problems using one’s own basic skill sets in statistical learning, but, perhaps be able to build such algorithms and systems from scratch. This kind of deeper skill is what differentiates the wheat from the chaff in data science.

Concluding Remarks

In conclusion, I believe the mentor’s role in data science has changed – mentors today have their tasks cut out, when it comes to building deeper skill in their data scientists – they should emphasise technical acumen first and foremost, problem understanding and characterisation next, and problem solving effectiveness after this. This builds up a well-layered skill-set where technical skills can perform a harmonious dance in amalgam, resulting in true value to the data science market.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s