Over the past two years, we have participated in numerous projects where Azure Databricks was implemented from the ground up. Each of these deployments allowed us to learn something new, verify previous solutions, and ultimately develop a methodology that allows us to deploy Azure Databricks in a standardized, enterprise-scale-ready manner. As a result, the newly […]
Author: Tomasz Kostyrka
Utilizing YAML Anchors in Databricks Asset Bundles
We all know what YAML is – it’s like JSON, just with indentation instead of brackets. Easier to write and read. That’s it, isn’t it? In most situations… yes. But if we look a little deeper, we’ll find features that many people have no idea exist. And let me emphasize right away, I’m not judging […]
Databricks: MERGE WITH SCHEMA EVOLUTION
Anyone who has ever designed an ETL process involving more than a few tables of data has likely encountered the need to build a metadata-driven framework. By ‘framework,’ I mean any solution that standardizes this process and allows for scaling through configuration changes. Regardless of whether it involved BIML, SSIS packages generated from C#, dynamic […]
Terraforming ADF: Shared Self-Hosted Integration Runtime
In one of our previous posts, we explained what self-hosted integration runtimes are and how to fully configure them using Terraform. Today, we’ll take it a step further by discussing the sharing mechanism that allows us to reuse the same runtime across multiple Azure Data Factories. Multiple Integration Runtimes Let’s consider the following scenario: our […]