As you probably know, Power BI datasets can be refreshed using the built-in schedule in the Power BI service. Depending on whether we’re talking about Premium Capacity or Shared Capacity, we have a certain number of refreshes available per day. Of course, the schedule is not the only option when it comes to refreshing the dataset – we can also perform this operation from various tools that send a request through the Power BI service’s REST API. Today, I would like to talk a bit about how to send such a request using Azure Data Factory – one of the most popular Azure cloud tools related to data engineering.
Before we move on to ADF, let’s take a look at the documentation to see what a refresh request should look like (link). It turns out that the entire operation boils down to sending a request using a specified method to a specified URL – it looks like this:
POST https://api.powerbi.com/v1.0/myorg/groups/{groupId}/datasets/{datasetId}/refreshes
The first thing that catches our eye is that the entire request must be sent using the POST method, which is quite logical. The URL doesn’t look unusual either, and all we need to provide in it is the workspace (groupId) and dataset (datasetId) identifiers. This information is available, for example, from PowerShell and dedicated cmdlets for Power BI. However, if we want to simplify things a bit, we can obtain this information by accessing the properties of the dataset we have selected – then they will also be available in the URL:
In the example above, we can see that the identifier appearing after “groups” refers to the workspace, and the one appearing after “datasets” is the identifier of our dataset. So we have all the information we need to build the request – now let’s think about how we will authenticate to the Power BI service.
To do this, we can use one of two methods:
- Master account
- Service principal
A master account is simply a regular account with permissions in a specific workspace. This type of account must, of course, have the appropriate license assigned to it and must have the necessary permissions to perform the refresh operation. This approach has been used for quite some time, but it has several drawbacks. The first of these is the license assignment itself, which involves additional costs. We can imagine situations where we have certain security procedures where only certain accounts have access to specific workspaces, so we will have to create N accounts to refresh data, which can already be a significant item in our budget. It is also worth noting that this type of account must be treated appropriately, e.g., it cannot have Multi-Factor Authentication enabled in Azure Active Directory, and its password should be appropriately rotated and updated, which in some scenarios can be a barrier to overcome.
Creating a Service Principal in AAD is relatively simple and can be done in two different places. The first way is of course through the Azure portal and using the App registration applet:
The second way is the Power BI Registration Tool, which essentially leads us to the same process, but the data input interface is adapted to the Power BI ecosystem:
The registration process is well described in the documentation and I refer you to it (link).
After obtaining the Service Principal, we can navigate to the tenant settings and allow the use of Service Principals for interacting with that tenant in the Tenant Settings section.
In addition, in the above image, you can see that we can specify a security group to which we can add our SP to manage access at a slightly more granular level. Once we have everything set up on the tenant side, we can go to the selected workspace and add our principal like any normal user.
Moving on to the actual part of this article, let’s switch to Azure Data Factory where we will create a pipeline to refresh our dataset. To do this, we will use the Web Request activity, which we will name ACT_GEN_WEB_RefreshPowerBIDataset. The Settings tab is crucial for the operation of this element:
We have several things to fill in here:
- URL – the request address specified in the introduction of this article,
- Method – the method by which the request should be sent,
- Headers – optional values sent in the request,
- Body – optional request body in JSON format, Datasets – a reference to the dataset in ADF,
- Linked services – a reference to the Linked service in ADF,
- Integration runtime – the runtime environment on which this task is to be run,
- Disable certificate validation – the option to disable certificate validation,
- Authentication – the authentication method for the REST API, which includes several options, including Service Principal,
- Tenant – the tenant identifier,
- Service principal ID – if we selected Service Principal, we need to provide its ID,
- Service principal credential type – the authentication method for the SP, which can be a certificate or a secret, Service principal certificate/secret – the option to provide a certificate or secret for the SP (or retrieve it from Azure Key Vault),
- Resource – the identifier of the resource to which we are referring.
As you can see, there are quite a few options to fill out. Below are the options that I set in my case:
- In my case, the URL is an expression that combines a base address with parameters for the workspace and dataset identifiers.
@concat('https://api.powerbi.com/v1.0/myorg/groups/','pipeline().parameters.__paramWorkspaceId','/datasets/',pipeline().parameters.__paramDataSetId,'/refreshes')
- The Method should be set to POST according to the documentation.
- For the Body, we need to pass an empty JSON object, which is {}, but for some reason, ADF didn’t accept this value, so I used the following expression instead:
@concat('{','}')
- Authentication – Service Principal
- Tenant – The tenant ID, which can be easily found in AAD on the Overview tab:
- Service Principal ID -The service principal identifier, also known as application/client ID, can be found by searching for the App registrations applet:
Then, locate the Service Principal that we created or want to use for refreshing and there we can easily find the identifier we are interested in:
- Next, we need to provide the authentication method for our SP. Depending on our needs, we can provide either a secret or a certificate as the authentication method. Regardless of the chosen method, it’s recommended to store it in Azure Key Vault (I’ve written about how to use AKV with ADF here).
- The last option that interests us is Resource, and for every operation performed on the Power BI REST API, the value is constant: https://analysis.windows.net/powerbi/api.
We have our request ready and there is nothing to prevent us from running it by providing the required parameters. The whole operation was surprisingly fast:
Just because the task was executed successfully does not mean that the dataset has been refreshed. The request we sent works asynchronously, which means that in this case, the request was accepted for processing, but we do not know the result yet. If our process is only to initiate the refresh, then our work ends here – but if we want to know whether the refresh succeeded or not, we need to prepare an additional step.
To check the status, the Until loop will be useful, which will check whether the refresh has completed every specified period of time. To handle it, I created the __varStatus variable, which will store information about the last status. The expression that handles the loop looks like this:
@or(or(equals(variables('__varRefreshStatus'),'Completed'),equals(variables('__varRefreshStatus'),'Failed')),equals(variables('__varRefreshStatus'),'Disabled'))
We need to check whether the returned status has a value of Completed, Failed, or Disabled – these are the statuses that the refresh process can end with – until this happens, we still need to check. In addition, I recommend setting a timeout after which the check will end anyway. In my case, I assume that the refresh should not take longer than 20 minutes – unfortunately, this property in ADF cannot be parameterized:
Inside the loop, we have three tasks:
The first one is to wait for 30 seconds between each loop iteration. The second task is to call the REST API to check the status of the last refresh process – it looks very similar to what we had before, except that we use the GET method and, of course, a different URL:
@concat('https://api.powerbi.com/v1.0/myorg/groups/',pipeline().parameters.__paramWorkspaceId,'/datasets/',pipeline().parameters.__paramDataSetId,'/refreshes?$top=1')
This way we will receive the current refresh status. The last element is a task assigning the status read in the previous step to the variable. The used expression looks as follows:
@activity('ACT_GEN_WEB_GetRefreshStatus').output.value[0].status
The above expression refers to the previous step, or rather what will be returned, and then selects the first element from the array and its status. I would like to remind you that this variable controls the loop, so as soon as a value indicating the success or failure of the operation is assigned to the variable, the loop will be stopped and we will have the status in the variable.
I received the following result after running our pipeline:
We can see that the request was accepted without any problems and the loop was executed only once, why? Simply because the refresh did not take too long and there was no need for the loop to continue. To make sure that our operation was successful, we can check the refresh from the Power BI portal – we can find the refresh history in the properties of the corresponding dataset.
As you can see above, my call ended successfully and didn’t take too long. If we wanted to expand the presented mechanism, for example by sending emails in case of an error or refreshing all datasets in the given workspace, there is nothing stopping us and it should not be a problem for anyone. You just need to know which command corresponds to which operation, and all these operations can be found in the Power BI REST API documentation (link).
That’s it for now – thank you for your time and have a nice day!
- Avoiding Issues: Monitoring Query Pushdowns in Databricks Federated Queries - October 27, 2024
- Microsoft Fabric: Using Workspace Identity for Authentication - September 25, 2024
- Executing SQL queries from Azure DevOps using Service Connection credentials - August 28, 2024
Last comments