The Future for Data Scientists

Originally an answer on Quora, this is an interesting topic to discuss, given the rapid pace of change in the intersecting, related and rapidly evolving fields of data science, analytics, AI and machine learning.

As of early 2018, I see the evolution of data science roles in industry in three sets of time frames – a short term time frame, over the next one or two years, followed by a longer term time frame, between two and five years. I also expect that the data scientist as a role will be so infused into management and business so as to not be called out separately in about five years. So, let’s look at the timeline and what it could mean for data scientists.

One-to-Two Years Time Horizon

  1. The number of data scientists in the market will increase greatly, although their quality and readiness for projects will continue to be largely poor. As I’ve said in an earlier answer, a static skill set will get you nowhere in the competitive world of data science. This will become even more the case in the next two years. Knowledge of old tools and frameworks becomes obviated by the need to learn new ones.
  2. A great deal more emphasis on productionising and operationalising data science in the near term. This means that data scientists will be expected to leverage APIs, microservices architectures and such approaches to ensure that data science results in applications, and not merely analyses with an expiration date
  3. The increasing high-level nature of data science tools and frameworks will continue to democratise data science. This will bring people who aren’t strictly data scientists into the fold. Being a data science professional who spent significant time in engineering and quality management before this, I see this as an enabler for smart, knowledgeable and analytically minded professionals from different walks of society, to embrace data. Deployment-friendly data architectures and data infrastructure, such as on the cloud, will enable this transformation too.
  4. Certain common kinds of statistical analysis and machine learning systems will become common and productionized. This is due to the increasing popularity of probabilistic programming frameworks and deep learning frameworks. Common problems such as face or other biometric data analysis, specialized systems that are data centric, such as autonomous vehicles, etc., will become more mature. This will cause probabilistic modeling and deep learning to become a defacto part of the data science skill sets
  5. Data scientists will be expected to be competent data engineers at one level, and at least partly application developers also. A lot of the changes and improvements to data science education will happen along these lines

Three-to-Five Years Time Horizon

Prediction is very difficult, especially if it's about the future. - Niels Bohr

I love this quote by Niels Bohr, and I suppose it subtly sums up the challenge posed by the original Quora question I tried to answer. That said, I do have some views about what could potentially unfold. Would love to know if you think anything else belongs here, or if anything should be different.

  1. The ethics of data science and artificial intelligence will begin to become a really serious topic. Although this is being discussed extensively by academicians, industry stalwarts and intellectuals in 2018, the debate will have real implications a few years down the line, and I expect that data scientists will be held ethically accountable for their work in a few years time. Data use and algorithm use regulations will begin to appear, just as regulations on industrial systems or weapons exist today. The GDPR is a defensive regulation on data security – I expect to see strategically impactful regulations in future.
  2. Data science education will come to include artificial intelligence, and will improve vastly, becoming a standard part of college curricula. This expanded curriculum will include deeper focus on computational engineering, statistical engineering and large scale data based simulations. Data scientists in the future could be drawn from different backgrounds and experiences
  3. I expect a revival of interest in the use of large scale simulation tools and methods. Stochastic simulation of systems at a large scale has been around for a while, on the sidelines. I expect that truly large scale simulations of real world systems are more possible than they were before, and this will become the de-facto way of engineering many kinds of systems.
  4. The value of domain understanding and domain modeling in data science will be emphasised for data scientists. Ontology models will become more common in data science. Data scientists who cannot build domain-aware systems may even come to be regarded like those among data scientists today who don’t understand simple data algorithms are.
  5. Fully automated data science systems will be able to serve a large number of common use cases. This kind of automation of data science will, towards the end of half a decade from now, allow data scientists to flit from one high-level task to another. Furthermore this kind of a capability may allow organizations to not need data scientists per se, but analytical staff who can straddle the different systems and use cases commonly handled.

Beyond five years: I hope that the term “data scientist” becomes outdated five years hence. If it doesn’t, it may mean that we haven’t sufficiently been able to leverage the abilities of the technologies and frameworks to operationalise and productionise data science, or build any truly intelligent data driven systems.