When it was created in 2013, Databricks set out to democratize access to big data processing. To achieve this, Databricks developed the Databricks Data Platform, a proprietary SaaS service deployed on top of a cloud, which became available in 2015. The platform was later enhanced with machine learning capabilities via MLFlow, a data catalog with Unity Catalog, and an orchestrator with Delta Live Tables.
Before Databricks, a data engineer had to manually configure their cluster, whether Hadoop or another system, to use Spark. They could not explore their data directly with Spark and had to rely on additional tools such as Zeppelin. Thanks to Databricks, data engineers can now implement Spark transformations without worrying about infrastructure management. Once the platform is installed and pre-configured clusters are set up, data engineers can operate autonomously to execute their transformations. Databricks Notebooks also allow them to explore their data directly within the platform.
Additionally, Unity Catalog enables data documentation and access control. Databricks is ideal for companies handling large data volumes and for facilitating collaboration between data engineers and data scientists. It is also a turnkey data platform that simplifies maintenance, making it a smart choice when integrating new tools into an existing IT system is complex.
However, Databricks is heavily tied to Spark and only fully reveals its potential when the data volumes processed are large enough to benefit from parallel computing.
THEODO’S POINT OF VIEW
At Theodo, we recommend Databricks Data Platform, especially for processing large-scale data: it is a mature, high-performance, and comprehensive technology.
Lorem ipsum dolor sit amet consectetur. Eu tristique a enim ut eros sed enim facilisis. Enim curabitur ullamcorper morbi ultrices tincidunt. Risus tristique posuere faucibus lacus semper.
En savoir plus