24

Snowflake Data Engineering

January 2025

Assess

Over the past few years, Snowflake has introduced several key features to orchestrate data transformation workflows directly within the platform. With the introduction of Tasks in 2019 and DAGs in 2022, it is now possible to create pipelines of orchestrated SQL commands directly in Snowflake. This eliminates the need for an external orchestrator, which can be costly and require additional configuration to access data within Snowflake.

In 2022, Snowflake also launched Snowpark, an API that allows large-scale data processing within Snowflake using Python, Java, and Scala. With an interface similar to Spark, Snowpark overcomes SQL’s limitations while benefiting from distributed computing on Snowflake’s infrastructure. Thanks to Stored Procedures in Python, Java, or Scala, these languages can also be integrated into Snowflake DAGs.

In 2023, the introduction of Logging and Tracing for Stored Procedures enabled pipeline monitoring, a crucial feature for ensuring the stability of a production environment.

However, developing an ETL pipeline natively on Snowflake presents some limitations, particularly when compared to more mature ETL tools like DBT or Airflow.

  • Technical maturity: The Snowpark API is complex to use, evolves rapidly, and its documentation remains limited. Unit or integration testing for a Snowflake ETL is difficult, and debugging Stored Procedure errors can be challenging.
  • Version control and deployment: While Git integration is now available to the public, it is still recent. Additionally, the Python API for managing Snowflake objects, such as Tasks and DAGs, is still in preview mode.
  • External interfacing: Snowflake’s compute resources do not have direct internet access. Making an external API call requires additional configuration (Network Rule, Security Integration, External Access Integration). These requests pass through multiple network layers, making latency troubleshooting more complex.

Despite these challenges, the Snowflake ecosystem is evolving rapidly and could become a robust Data Engineering service in the coming years.

 

Theodo’s point of view

Today, we recommend using Snowflake primarily as a powerful SQL engine while orchestrating transformations with more mature external services (such as a DBT/Airflow stack). However, in cases where industrialization requirements are low or if maintaining a single, unified platform is a priority, Snowflake’s ETL tools can be sufficient.

Notre point de vue

Le point de vue de notre partenaire

Related Blip

No items found.

Téléchargez votre

Travaillons ensemble

Lorem ipsum dolor sit amet consectetur. Eu tristique a enim ut eros sed enim facilisis. Enim curabitur ullamcorper morbi ultrices tincidunt. Risus tristique posuere faucibus lacus semper.

En savoir plus
Équipe en réunion

Nos Radars

No items found.
No items found.