With the AI boom this past decade, followed by the very (too) broad explosion of Deep Learning and Big Data, many new data-related roles have been defined. Demands for Data [Analysts | Architects | Governors | Engineers | Scientists] have emerged in response to the growing need to identify the skills and skill chains necessary to apply the newest techniques for handling and analyzing complex and voluminous data in real (i.e. business) environments.
Until now, these techniques were either reserved for the public research sector or for an omnipresent, attractive and innovative technological elite, namely the GAFAM, NATU & BATX1. However, it is now much easier for the majority of private companies to generate real value from this technical as well as theoretical knowledge.
Following this transition, the position of Data Scientist has gathered all the expectations and hopes of CxOs. The Data Scientist position alone has become exponentially more attractive, while others, such as Business Analyst or statistician have been a bit forsaken. This craze is unfortunately most often unfair —and unjustified— for these data experts and for companies that considered having found THE profile that would finally meet all their expectations.
What Data Science expectations have we identified in our clients?
The majority of the true Data Science needs we observe today is the support of CxOs by providing them with valuable and unique elements of truth found in large, multimodal and extremely complex data.
This means engaging in a real exploration journey through a jungle of complex and imperfect data2, uncover the holy Causalities3 to bring a new and unique vision that will finally enable the identification of new pieces of truth about the business.
However, due to some timidity on the one hand, and to a (relative) lack of resources on the other, companies do not always dare to build a team dedicated to the advanced analysis of their data. Therefore, it often happens that these companies are facing their own Data Warehouse / Lake full of supposedly valuable data with a terrible question: what should we do with all of this?!
This is when, among all the data-related profiles available, the hype around Data Scientists promising the almost impossible (Can we predict what will be the fashion color of pants in Berlin by 2054?) leads them to bias their research, pushing businesses to hire profiles in the hope that they will be able to:
- install Big Data & Cloud components to manage and ensure a robust access to the data
- analyze the data
- implement or create from scratch Machine/Deep Learning algorithms
- test them, optimize them, industrialize them
- create powerful visual representations of the consequent analysis
- develop APIs and dashboards in order, for everyone concerned, to have a robust access to either the data or the results of the analysis
Each of these steps constitutes a significant workload by itself but, first and foremost, a dedicated expertise! For these reasons, the position of Data Scientist, defined and considered as the above, is more and more qualified as a “unicorn” impossible to find in the real world (except in Scotland4).
“Can we predict the colour of fashionable trousers in Berlin in 2054?”
But the opposite also happens: companies own data that might not require the involvement of a Data Scientist but expect, because of a simple misunderstanding or from bad prior advice, that this profile can magically reveal truths hidden in the data. Data Scientists then get stuck and end up doing the job of Business Analysts or Statisticians. And this « miscasting » affects everyone:
- Data Scientists end up being disappointed because they cannot put their real expertise into practice
- CxOs end up being disappointed because the expected ROI increase might not happen
- Business/Data Analysts end up being disappointed because they are left out despite their tangible and valuable skills
- and the global market, impacted at the end of the line, for the revolution promised by Data Science is being slowed down.
This has created a global distrust and fear around Data Science and Data Scientists, as well as a devaluation of Business/Data Analysts and Statisticians which still are essential pillars of any company’s business data analytics activities.
This inadequate definition of the Data Scientist profile and sometimes of the unclear business needs has left everyone kind of losing out.
How to solve this? And what to really expect from a Data Scientist?
On your side: perfectly identify your needs and know your business.
On our side: mastering enough the different areas of expertise to offer you the best support with the profiles you really need.
During our collaborations with our clients, the precise definition of their need (a step that seems obvious but is unfortunately not always that easy nor well done) is in our opinion particularly critical since it will structure the following work. As examples:
- What is your use case?
- What are your hopes & expectations?
- What are your business bottlenecks to overcome?
- What is the volume of your data?
- What is the “expiration date” of your data?
- How flexible are you in your post-analysis activities5?
- What is your planned deadline?
- From what sources do your data come?
- What is your existing data management architecture?
Based on the answers to these questions (and others), it will then be possible to determine whether your need a Statistician6, a Business/Data Analyst, a Data Engineer, a Machine Learning Engineer or a Data Scientist.
We consider that a Data Scientist has a definite and precious added value when you are dealing with large amounts of complex data coming from many different sources but:
- you do not know what to do with it
- you don’t know what truths lie in it
- you don’t know how to discover these truths
And this is a major part of the Data Science activity: Exploring. Be it the data, the different statistical analysis techniques, the numerous Machine and/or Deep Learning models, or all the learning algorithms needed to improve the learning of the models. There are a lot of techniques, models and algorithms. Data Scientists must therefore have a good dose of open-mindedness, of adaptability and, most of all, of agnosticism: any method can be the right one for everything depends on the context, the characteristics of the data environment and the problem to be solved.
This critical exploration step requires someone with a global understanding of the “universe of data” and capable of producing convincing, solid and rigorous Proofs of Concept (PoCs). These PoCs will rely on a sophisticated chain of expertise7 that provide the visibility you need to help your strategy for a sustainable and rapid business growth.
If you want to undertake an innovative approach based on the deep exploration of the data you have: call for a Data Scientist. They will provide you with the valuable insights you will need to help you determine the future direction for your business.
When you want to move towards industrialization, call for the right experts: Developers for solid back-ends & user-friendly front-ends, Data Architects/Governors to build robust data acquisition pipelines and for efficient Data management (which is a critical point), Business/Data Analysts for business data analytics and their visualization, and finally ML Engineers for the optimization and refining of the global learning architecture.
1GAFAM: Google, Apple, Facebook, Amazon and Microsoft (also know as the Big 5); NATU: Netflix, Airbnb, Tesla and Uber; BATX: Baidu, Alibaba, Tencent and Xiaomi.
2Mostly: missing data, skewed/imbalanced categories, or multiple and potentially incompatible formats.
3In the absence of causalities, which are extremely difficult to truly prove, we will most of the time identify the most relevant and robust correlations.
5Please note that the subsequent latitude to make your business evolve following the delivered analysis is critical: a company that really wants to be data-driven must accept to be really driven by the conclusions that will be drawn from the advanced data analysis (we will soon come back to this in a dedicated article).
6btw: almost all statisticians are now fully capable to code in Python/R/Matlab/Java or other languages.
7Including: mathematics, statistical analysis, handling large datasets, knowledge of best practices, scientifically oriented programming, data visualization, algorithmics, basics in code optimization, machine learning, deep learning, scientific reasoning, data & results visualization.