Azure Policy – an underrated component of a scalable data platform (part2)

In the first part of the article, we discussed the basics of the Azure Policy service and provided examples of using definitions with audit and deny effects, or in simpler terms, policies that block the creation of incorrect resources. In this part, we will focus on a slightly more advanced concept: remediation, which involves automatically bringing resources into compliance by modifying or creating new resources. We will use definitions with the modify and deployIfNotExists (DINE) effects for this purpose. (link: Remediate non-compliant resources with Azure Policy)

Let’s start by noting that any additional actions performed by Azure Policy must operate using a Managed Identity (either system-assigned or user-assigned), which must have the appropriate permissions assigned (RBAC).

The configuration of Managed Identity is done during the assignment of the definition to a specific scope, i.e., when creating the policy assignment. The following Terraform code demonstrates an example assignment of an initiative, which requires linking with the previously mentioned Managed Identity for proper operation.

# umid used in tag-related remediation tasks
# ------------------------------------------
resource "azurerm_user_assigned_identity" "umid-azp-tags" {
    name                        = "id-${var.environment}-plz-tagging-we-001"

    resource_group_name         = azurerm_resource_group.resgrp-azp-plzumi.name
    location                    = var.resource_location
} 

resource "azurerm_role_assignment" "umid-azp-tags-rbac" {
    scope                       = data.azurerm_subscription.current.id
    principal_id                = azurerm_user_assigned_identity.umid-azp-tags.principal_id
    role_definition_name        = "Contributor"
}

# custom initiative assignment
# ------------------------------------------
resource "azurerm_subscription_policy_assignment" "azin-tags-inc-assignment" {
    name                    =   "data platform tagging"    
    display_name            =   "[INC-DP]: Tagging"
    description             =   "-"

    location                =   var.resource_location

    subscription_id         =   data.azurerm_subscription.current.id
    policy_definition_id    =   azurerm_policy_set_definition.azin-tags-inc.id

    identity{
        type = "UserAssigned"
        identity_ids = [azurerm_user_assigned_identity.umid-azp-tags.id]
    }

    parameters = jsonencode({})
}

Let’s now move on to examples that will best illustrate the usefulness of these mechanisms.

Tagging

The first example is enforcing proper tagging of all elements of our solution. A suitable tagging strategy is a key element of efficient management in the Azure cloud, allowing for:

Cost analysis.
Identification of ownership.
Determining the sensitivity of a resource for Security & Compliance needs.

Building a solution that enforces proper resource tagging is possible thanks to a set of built-in policies, which include, for example:

Add or replace a tag on subscriptions (61a4d60b-7326-440e-8051-9f94394d4dd1)
Add or replace a tag on resource groups (d157c373-a6c4-483d-aaad-570756956268)
Add or replace a tag on resources (5ffd78d9-436d-4b41-a421-5baa819e3008)

Additionally, ensuring consistency between the labeling of subscriptions, resource groups, and the resources themselves can be achieved using the following:

Inherit a tag from the subscription (b27a0cbd-a167-4dfa-ae64-4337be671140)
Inherit a tag from the resource group (cd3aa116-8754-49c9-a813-ad46512ece54)

Diagnostic Settings

Another valuable example is automating the process of creating Diagnostic Settings. In this case, besides the benefits associated with process automation, it is also essential that individuals responsible for creating resources do not need permissions for the central resource where metrics and logs are sent (Log Analytics Workspace, Azure Storage Account, Event Hub). This allows for centralization of this resource without assigning high permissions to the owners of specific systems from which we want to collect logs.

Here, we can also make use of ready-made definitions, including:

Configure diagnostic settings for Storage Accounts to Log Analytics workspace (59759c62-9a22-4cdf-ae64-074495983fef)
Deploy – Configure diagnostic settings for Azure Key Vault to Log Analytics workspace (951af2fa-529b-416e-ab6e-066fd85ac459)
Configure diagnostic settings for Azure Databricks Workspaces to Log Analytics workspace (23057b42-ca8d-4aa0-a3dc-96a98b5b5a3d)

Importantly, these policies are triggered not only during resource creation but also during its modification – therefore, any attempt to delete automatically created Diagnostic Settings will be remediated.

(link: Create diagnostic settings at scale)

Private DNS Zones

The last of the examples presented in the article is the configuration of a Private DNS Zone in a Centralized model and Hub and Spoke architecture.

While it might seem that networking-related topics are not within the scope of responsibilities of data platform teams, in reality, such tasks often arise within the project scope. Even if we are not ultimately responsible for the entire solution configuration, knowledge of the theoretical basics is necessary for smooth collaboration with Networking/Security teams.

(https://github.com/paolosalvatori/private-endpoints-topologies)

The above diagram depicts a highly simplified solution centralizing Private DNS Zone resources in the organization. In this model, individuals responsible for building the platform typically do not have any permissions in the central Hub subscription, where the Private DNS Zones are located. In this setup, the platform owner has the ability to create Private Endpoint objects and assign them a selected IP address in the local Spoke network, but they do not have the ability to create an entry allowing public FQDN resolution to a private IP address.

Granting such permissions is not possible because objects in the Hub subscription are central resources, and manually adding entries – for example, by sending them via email – is cumbersome and error-prone.

Similar to Diagnostic Settings, we can find many ready-made definitions that allow us to automate this process. The definitions mentioned below are triggered each time a Private Endpoint resource is created and a corresponding entry is then automatically registered in the appropriate Private DNS Zone. Thanks to the use of Managed Identity, which is attached to the definition, delegating permissions to the team responsible for creating the platform is not required. Below are some of the built-in definitions.

Configure a private DNS Zone ID for blob groupID (75973700-529f-4de2-b794-fb9b6781b6b0)
Configure a private DNS Zone ID for dfs groupID (83c6fe0f-2316-444a-99a1-1ecd8a7872ca)
Configure Azure Key Vaults to use private DNS zones (ac673a9a-f77d-4846-b2d8-a57f8e1c01d4)
Configure Azure Databricks workspace to use private DNS zone (0eddd7f3-3d9b-4927-a07a-806e8ac9486c)

(link: Private Link and DNS integration at scale)

Summary

Throughout this two-part article, we delved into the fundamentals of Azure Policy and its role in defining cloud infrastructure. The examples mentioned above illustrate instances where this service blocks incorrect actions, facilitates automatic repair of resources, or assists in their creation. I hope this read has convinced you of the value in including and implementing the Azure Policy service early in your project, especially in the context of your future Data Platform built in the Azure Cloud.