I think I should not convince you to learn PySpark. There is multiple reasons why we should do that, first of all PySpark is one of the most widely used big data processing frameworks, providing support for large-scale data processing using the Apache Spark framework that currently dominated Big Data world. How to learn it? […]
Latest Posts
Delta Lake 101 Part 2: Transaction Log
Let’s continue our series on Delta Lake. In the first article, I covered the basics of the Delta format itself. Today, I would like to share with you some essential information about the Transaction Log. As you know, after our first article, Delta consists of a set of Parquet files that hold the data itself […]
Using variables in loops in Data Factory – why it’s not worth it
Loops are well-known constructs and they are a fundamental and necessary element of programming. This is no different in ADF, where loops play a standard role related to storing values fetched or calculated in a specific location of the data flow. The schema of Data Factory is rather well-known and involves nesting specific calls, etc. […]
Allow Azure Services and resources to access this server – How it works in Azure SQL Database?
One of the first options that Azure SQL Database users encounter when creating this resource is the “Allow Azure Services and resources to access this server” option, along with other network settings related to our server. While allowing traffic from a specific IP address or range of IP addresses to our server is fairly intuitive […]
Last comments