Data-Ops (DevOps)
Data-Ops (DevOps+Agile):
There are four key software components of a DataOps Platform: data pipeline orchestration, testing and production quality, deployment automation, and data science model deployment / sandbox management.
1.Data Pipeline Orchestration: DataOps needs a directed graph-based workflow that contains all the data access, integration, model and visualization steps in the data analytic production process.
2. Automated Testing and Production Quality and Alerts: DataOps automatically tests and monitors the production quality of all data and artifacts in the data analytic production process as well as testing the code changes during the deployment process.
3. Deployment Automation and Development Sandbox Creation: DataOps continuously moves code and configuration continuously from development environments into production.
4. Data Science Model Deployment: DataOps-driven data science teams make reproducible development environments and move models into production. Some have called this ‘MLOps” or “ModelOps”.
DataOps Supporting Functions
In addition to the foundational tools above, there are many software components that play a critical supporting role in the DataOps ecosystem.
A.Code and artifact storage (e.g. git, dockerhub, etc)
B.Parametrization and secure key storage (eg. Vault, jinja
C.Distributed computing (e.g. mesos, kubernetes
D. DataSecOps, Versioning, or Test Data
2. Big Data Performance Management.