In a world where the data changes every day, how can we possibly imagine to wait for 18 months from the moment a predictive model is designed to when it is actually implemented? Welcome to the DataOps era, stemming from DevOps, Agile and Lean methods, and aiming at promoting a fast while efficient exploitation of data.
In its 2019 predictions, Gartner states that by 2022, 80% of the insights provided by Data Analytics strategies will actually not have produced any concrete operational results. The picture might seem pessimistic but it only emphasizes the challenges involved in developing an application for data analytics.
Not Only DevOps
Not only does the DataOps have to tackle all the usual software development issues of Agile and DevOps methodologies, but they also have to perform relevant and efficient Data Analytics on the data.
A decision-making tool can only be effective if it is continuously adjusted and updated to match both the input data and the needs of the business user.
These inherent constraints of the Data Analytics world have a direct impact on development: for example, they lead to they lead to regular changes in the code and its availability. They also require the implementation of tests to track the evolution of the quality throughout the development.
The DataOps culture brings together all the tools and methods designed to address these issues. Similarly to the DevOps culture from which it emerges, DataOps aims at improving delivery performance and application quality, but it is not limited to the development, integration, testing and deployment phases: it also includes aspects related to data orchestration and the results monitoring.
The three aspects of DataOps
Firstly, DataOps invokes agile development methods. They apply practices using for instance versioning or shared working repositories. This enables a more dynamic management of the project based on business priorities and user feedback. The migration from the development environment to production is done on a regular basis, in an iterative way, as with traditional application code.
It then uses DevOps’ collaborative approach. In a DataOps team, the data scientists designing the models work directly with the data engineers who bring life to them, but also with the teams in charge of the production. Moreover, DataOps always interacts with users validating at the end the relevance and efficiency of the product. The technical aspect includes all the traditional levers related to continuous delivery, from the creation of preconfigured environments to automated deployment.
Statistical Process Control (SPC) is the third aspect of the DataOps. Inspired by lean manufacturing, SPC covers all the techniques used for assessing the quality of the results provided by the application. It enables a constant monitoring and triggers alerts as soon as an anomaly is detected.
On the way to a data-driven culture
These three aspects contribute to the creation of a real data flow from a data lake to the end-user. Orchestration and automation tools are used for collaboration in order to ensure both the delivery and the analytics model performance. As a result, DataOps becomes the pillar of a real data governance at the enterprise level.