FabricFallingFromDirectLakeToDirectQuery_00

How to determine if Direct Lake dataset is falling back to Direct Query in Microsoft Fabric

When you create a Direct Lake semantic model, it typically operates in Direct Lake mode. This means it directly accesses the delta table from the OneLake. This is the preferred scenario because it helps queries run fast, sometimes as fast as import mode.

In Power BI, when utilizing Direct Lake mode, the semantic models directly read delta tables from OneLake. However, it may happen that if there are issues with Direct Lake, the DAX query might switch to DirectQuery. There are multiple reasons for this behavior, with the most common ones being:

  • Usage of warehouse views in the model,
  • Row-level security applied in the source warehouse,
  • The model is approaching its capacity limit for the SKU specified in the table below.

 

As you can see, these limits are aligned with specific SKU. The only additional aspect I want to mention is the Max memory limit. Exceeding this limit doesn’t directly trigger a fallback to DirectQuery. Instead, it indicates that data will be paged out from memory, potentially leading to performance degradation. Let’s discuss how we can detect if a fallback has occurred.

How can you obtain information about the size of a model on disk? You don’t have to calculate it on your own, which can be difficult. Instead, you can find this information in the workspace settings under the System storage tab:

When it comes to determining the number of files per table, it’s a simple count. However, when considering the number of row groups within Parquet files, it’s not as straightforward. Therefore, I plan to write a dedicated article on this topic.

We have all the needed information so let’s create a Warehouse that will serve as our data source:

With the warehouse in place, we can proceed to create some sample data. Below, you’ll find two tables:

  1. warehouse_employees: This table contains employee data with two sample rows.
  2. warehouse_sales: This table contains sales data with three transactions.

Additionally, there’s a view named v_warehouse_employees that selects all records from the warehouse_employees table.

CREATE TABLE warehouse_employees
(
 ID INT,
 name VARCHAR(50),
 surname VARCHAR(50)
)

INSERT INTO warehouse_employees
VALUES
(1,'John','Doe')

INSERT INTO warehouse_employees
VALUES
(2,'Jane','Doe')


CREATE VIEW v_warehouse_employees
AS
SELECT * FROM warehouse_employees

CREATE TABLE warehouse_sales
(
 customer_id INT,
 amount DECIMAL(12,2),
 quantity INT
)

INSERT INTO warehouse_sales
VALUES
(1,2,1),
(1,3.5,2),
(2,3.5,2)

You could say the above script is quite straightforward, but it should suffice to illustrate everything regarding our topic today. After implementing the sample data, a dedicated folder structure is created for us on OneLake. To verify this, we can connect directly to it using tools like Azure Storage Explorer, using the following address:

https://onelake.dfs.fabric.microsoft.com/WorkspaceName/

After connecting, you’ll notice that each table has its own folder for Delta Lake files.

Now that our warehouse structures are ready, we can navigate to the Reporting tab and create a new semantic model. Let’s proceed with creating it:

Currently, there is no way to create Direct Lake models directly in Power BI Desktop. Therefore, we’ll utilize the option mentioned above to create it. In the new window, we can select the objects that we want to include within the model. Let’s include all three objects that we created earlier:

Once the model is created, let’s open it to configure it according to our preferences.

Firstly, let’s create relationships as shown in the screenshot below. Please note that there won’t be any checks to verify if these relationships are correct, and Fabric will rely entirely on your configuration. Therefore, please double-check to ensure that everything is created correctly with the proper cardinality:

A crucial aspect of Direct Lake and fallback to DirectQuery lies in the properties of the Semantic model. This property is called “Direct Lake behavior” and we have three options to choose from:

  1. Automatic: Fabric will automatically choose whether Direct Lake or DirectQuery should be used. In practice, this means that Direct Lake will be used whenever possible.
  2. Direct Lake Only: Direct Lake will always be used. If it’s not possible to use Direct Lake, an error will occur.
  3. Direct Query Only: Direct Query will always be used.

Now that our model is ready, let’s open Power BI Desktop and connect to it.

We can create a report that is very simple and consists of a plain table from warehouse_sales, along with two slicers. One slicer will be based on warehouse_employee, and the other on a view. Let’s proceed with creating this report.

The first method to determine if Direct Query is being used is to utilize the Performance Analyzer. Let’s turn it on:

In the first scenario, let’s select a value in a slicer that is based on the warehouse_employee table. Upon examining the results recorded in the Performance Analyzer, we will observe that only the slicer created on top of a view used Direct Query.

In the second scenario, let’s select a value in a slicer that is based on a view. Unfortunately, something undesirable occurred because our table also fell back to Direct Query. This happened because filtering on warehouse_sales cannot occur in Direct Query mode when we use a view. It’s known that using a view blocks pure Direct Lake mode.

If you want to trace what is happening then Performance Analyzer should be the right tool for you. Sometimes it is not enough because we would like to see more for example what kind of queries are sent to the model and by the model. In such cases good, old SQL Server Profiler can help. We can start new trace from this tool:

Connect to the model using Analysis Services as a Server Type, as a server name we can provide XMLA Endpoint like powerbi://api.powerbi.com/v1.0/myorg/workspacename and Microsoft Entra Password as a authentication method. Please ensure that on the Connection PRoperties page you will paste proper database name which is name of your semantic model:

In the next step on the General tab select Blank as a template because we will select all the events that can be useful for us. On the Events Selection tab you can select all the needed events like:

Query Events:

  • Query Begin
  • Query End

Query Processing:

  • DirectQuery Begin
  • DirectQuery End
  • VertiPaq SE Query Begin
  • VertiPaq SE Query Cache Match
  • VertiPaq SE Query end

After selecting the value on the report in the slicer that is based on the view we can see Direct query events like it is visible on the below screenshot:

Please be aware that you can see also Direct Query events in the model that is purely Direct Lake. It is because Fabric is checking in the Warehouse if there is any Row Level security applied. It is not a bad sign, it is standard behavior.

As you can see, managing Direct Lake and detecting fallback to Direct Query isn’t too difficult, but it’s crucial to understand. In future articles, we’ll continue exploring our Microsoft Fabric journey, so stay tuned for more updates!

Official documentation: link.

Leave a Reply