The medallion architecture is a framework introduced by Databricks to structure data flows in Data Lakes and better separate data quality cycles. This structure consists of three successive layers of transformations:
This simple concept significantly improves the quality of data transformation pipelines, whether within a data platform or even in a large Excel file.
By ensuring a clear separation between raw data (bronze) and consumed data (gold), it allows for high scalability of data flows. The medallion architecture encourages limiting the responsibilities of each table, making it easier to understand and modify calculation rules or even migrate data sources.
However, in a medallion architecture, data is often duplicated (raw, cleaned, filtered…), which can lead to significant costs for organizations already storing large volumes of data. Some pipelines may be less optimized compared to a single-step process, and the increase in tables and dependencies can lengthen workflow execution times. That said, these costs are generally offset by the savings in manual labor time.
Ultimately, this framework has become an industry standard, much like the staging/intermediate/datamart model promoted by dbt.
Theodo’s point of view
We strongly recommend using the medallion architecture in your data projects to ensure scalability and facilitate collaboration. At Theodo, we also adapt this framework by further segmenting each layer into multiple quality levels to maximize its benefits.
Lorem ipsum dolor sit amet consectetur. Eu tristique a enim ut eros sed enim facilisis. Enim curabitur ullamcorper morbi ultrices tincidunt. Risus tristique posuere faucibus lacus semper.
En savoir plus