Over the past few years, Snowflake has introduced several key features to orchestrate data transformation workflows directly within the platform. With the introduction of Tasks in 2019 and DAGs in 2022, it is now possible to create pipelines of orchestrated SQL commands directly in Snowflake. This eliminates the need for an external orchestrator, which can be costly and require additional configuration to access data within Snowflake.
In 2022, Snowflake also launched Snowpark, an API that allows large-scale data processing within Snowflake using Python, Java, and Scala. With an interface similar to Spark, Snowpark overcomes SQL’s limitations while benefiting from distributed computing on Snowflake’s infrastructure. Thanks to Stored Procedures in Python, Java, or Scala, these languages can also be integrated into Snowflake DAGs.
In 2023, the introduction of Logging and Tracing for Stored Procedures enabled pipeline monitoring, a crucial feature for ensuring the stability of a production environment.
However, developing an ETL pipeline natively on Snowflake presents some limitations, particularly when compared to more mature ETL tools like DBT or Airflow.
Despite these challenges, the Snowflake ecosystem is evolving rapidly and could become a robust Data Engineering service in the coming years.
Theodo’s point of view
Today, we recommend using Snowflake primarily as a powerful SQL engine while orchestrating transformations with more mature external services (such as a DBT/Airflow stack). However, in cases where industrialization requirements are low or if maintaining a single, unified platform is a priority, Snowflake’s ETL tools can be sufficient.
Lorem ipsum dolor sit amet consectetur. Eu tristique a enim ut eros sed enim facilisis. Enim curabitur ullamcorper morbi ultrices tincidunt. Risus tristique posuere faucibus lacus semper.
En savoir plus