40

Analytics Engineers

January 2025

Adopt

When launching a Machine Learning project, there are two main options for the technical stack. The first is using an end-to-end ML platform, where pre-built, tested components save time.

However, this approach comes with the typical drawbacks of managed solutions: higher costs, black-box functionalities, limited customization, restricted integration with other tools, and vendor lock-in. The second option is to use open-source tools and custom code to build a tailor-made stack, avoiding the pitfalls of managed solutions but requiring an initial investment in selecting and setting up the necessary components.

To simplify this second approach, we developed Sicarator, a project generator that allows users to quickly set up a high-quality ML project with the latest open-source technologies.

Initially created in 2022 for internal use, Sicarator became open-source a year later after proving its efficiency across more than twenty projects.

By following a command-line interface, users can generate a project structure that follows best practices, including:

  • Continuous integration with multiple quality checks (unit tests, linting, type checking)
  • Data visualization with a Streamlit dashboard
  • Experiment and data tracking, combining DVC and Streamlit for transparency and reproducibility

The generated code includes documentation, ensuring a smooth user experience. The tool is designed with a code-centric approach, maximizing control for data scientists and ML engineers. It evolves to reflect best practices in the ecosystem—for example, Ruff has recently replaced PyLint and Black as the linter/formatter.

However, Sicarator does not provide the full-fledged automation of advanced platforms, requiring additional manual setup. For instance, at this stage, it does not include automated training instance deployment.

 

Theodo’s point of view

We recommend adopting this approach in environments where collaboration between technical and business teams needs to be streamlined. Adding Analytics Engineers helps improve data model quality, optimize analytical processes, and increase operational efficiency while addressing challenges in collaboration between Data Engineering and Data Analytics teams.

Notre point de vue

Le point de vue de notre partenaire

Related Blip

No items found.

Téléchargez votre

Travaillons ensemble

Lorem ipsum dolor sit amet consectetur. Eu tristique a enim ut eros sed enim facilisis. Enim curabitur ullamcorper morbi ultrices tincidunt. Risus tristique posuere faucibus lacus semper.

En savoir plus
Équipe en réunion

Nos Radars

No items found.
No items found.