Today at the Build conference, Microsoft announced Microsoft Fabric, a unified analytics solution. What is it? It is a real game-changer, and in today’s article, I would like to share with you a general idea that comes with Fabric and how it can help you with your growing analytical needs. Let’s start! To begin, please […]
Author: Adrian Chodkowski
Pyspark – cheatsheet with comparison to SQL5
I think I should not convince you to learn PySpark. There is multiple reasons why we should do that, first of all PySpark is one of the most widely used big data processing frameworks, providing support for large-scale data processing using the Apache Spark framework that currently dominated Big Data world. How to learn it? […]
Delta Lake 101 Part 2: Transaction Log
Let’s continue our series on Delta Lake. In the first article, I covered the basics of the Delta format itself. Today, I would like to share with you some essential information about the Transaction Log. As you know, after our first article, Delta consists of a set of Parquet files that hold the data itself […]
Using variables in loops in Data Factory – why it’s not worth it
Loops are well-known constructs and they are a fundamental and necessary element of programming. This is no different in ADF, where loops play a standard role related to storing values fetched or calculated in a specific location of the data flow. The schema of Data Factory is rather well-known and involves nesting specific calls, etc. […]