If you have services like Azure Databricks and you have to synchronize identities inside the tool with those that exist in Azure Active Directory then how you can do it? Creating it manually is not the case because it is time-consuming and maintenance overhead is not acceptable. The best solution will be to use SCIM – System for Cross-Domain Identity Management. What is it and how to use it? Let’s find out!
First of all, let’s say a few words about SCIM. System for Cross-Domain Identity Management is an open standard protocol for automating the exchange of user identity information between identity domains and IT systems. It means that it can be used in many different scenarios where the exchange of user identities is needed. There are many different scenarios that are listed in the documentation but I will show the one that will be applicable in our scenario with Azure Databricks:
As you can see in the above diagram you have Azure Active Directory that acts as an identity provider and keeps all the groups and users that we want to replicate. Then we have Azure AD provisioning service that reads this information using Graph API Connector and sends it to the target service using its API on regular basis.
Above you can see high-level architecture of how the described mechanism works:
- Users and groups are created in Azure Active Directory,
- Created SCIM mechanism is synchronizing the current state of AAD with Databricks Account (Unity Catalog),
- Databricks administrators have synchronized and updated lists of users and groups and they can assign them to the workspaces.
Does it sound simple? It is! Configuration of it is even simpler! This scenario when you replicate objects FROM AAD is called Outbound synchronization but there is also an opposite version called Inbound synchronization when you want to synchronize users and groups from external HCM systems into Azure Active Dreictory.
How does it work for Databricks? A SCIM-based provisioning connector is provided for most applications in the Azure AD gallery including Azure Databricks.
Let’s see it in practice – let’s go to the account console of Databricks we can do it by typing https://accounts.cloud.databricks.com/ in our browser or by selecting it from the menu available under our name:
This action will redirect us to Unity Catalog. When we are there we can choose Setting and then under User provisioning -> Set up user provisioning:
A new window will appear with SCIM Token and Account SCIM Url. This information will be needed to configure provisioning in later steps – let’s save this information securely and click Done.
Now we have to register Enterprise Application in Azure so we must switch to Azure Active Directory and select Enterprise Application:
When we are there then we can register a new app by clicking New application:
In Azure AD Gallery we are looking for Azure Databricks SCIM Provisioning Connector:
Select it and confirm creation. After few minutes your app will be ready and then we must select Provisioning tab to set up all the needed synchronization settings:
If we click this the first time this tab ten we have to click Get started as it is visible below:
We have a few settings to set up. First of all Provisioning Mode should be set to Automatic because we want to synchronize all the objects between AAD and Databricks automatically without manual intervention. Under Admin Credentials we have to provide URL and Token that we get from Databricks (mentioned earlier SCIM Token and Account SCIM Url). If we provided the correct information then after clicking Test connection we should see that test succeeded. After all, we can click Save:
Then we can go back to the Provisioning section and there we have the option to Add scoping filter:
There is an option to set up Scope – we have these two options:
- Sync only assigned users and groups
- Sync all users and groups
We can select alsovMappings section to adjust provisioning to our needs:
For example I want to synchronize only those users that have “test” in their UPN so I selected Provision Azure Active Directory Users and then Source Object Scope:
Under the new window we can add scoping filter:
Then we can select a specific user object attribute and use one of the available operators to build filtering. In my case userPrincipalName includes “test”:
When we will save it the app will synchronize all the objects initially. Refresh will take place after the fixed interval which is equal to 40 minutes. Before the demonstration, I created 1000 accounts in AAD with “test” in their UPN.
When everything is ready I can click Start provisioning button available on the Provisioning tab. After some time I should see if everything was replicated correctly:
To confirm that everything works as expected I can go back to the Unity Catalog and see the list of provisioned users:
I removed one user and one group and then I checked if those objects don’t exist in the target Databricks platform:
SCIM is a very important mechanism that should be known not only to Azure Administrators but also to data engineers and other specialists. As you notice entire mechanism is pretty simple to set up and it is crucial to do it this way instead of manual creation. Try it yourself!
- Avoiding Issues: Monitoring Query Pushdowns in Databricks Federated Queries - October 27, 2024
- Microsoft Fabric: Using Workspace Identity for Authentication - September 25, 2024
- Executing SQL queries from Azure DevOps using Service Connection credentials - August 28, 2024
Last comments