Azure Policy – an underrated component of a scalable data platform (part1)

Recently, we’ve been encountering an increasing number of projects where a complete data platform has to be designed and built almost from scratch. In such projects, in addition to the typical duties and tasks in the data area, such as data modeling, designing and implementing pipelines, or the final reporting layer, there is a growing demand for the initial design and implementation of the necessary cloud infrastructure. These projects bring forth new challenges that were not present a few years ago when we typically worked in environments managed by the customers team – where someone else was responsible for building the platform.

The team assigned to such a project now includes more than just a Data Architect and a bunch of Data Engineers & Data Scientists. Achieving the goal of delivering the whole data platform requires a broader range of competencies; a place appeared for Cloud/DevOps Engineers (or Data Engineers who possess these competencies). It also became necessary to reorganize work and wisely distribute the involvement of individual team members during the project, especially in its initial phases.

In this article, I would like to focus on a specific stage of such a project and a particular service that can significantly simplify and accelerate our work while improving the quality of the delivered solution. The stage I am referring to is when the first version of the architecture is approved, and we can begin provisioning resources in Azure. The service I’m highlighting is Azure Policy.

Initial Project Requirements

For the purposes of this article, let’s adopt a simplified architecture containing basic Azure services that are commonly found in a significant number of platforms built in Azure cloud. These include, among others:

  • Azure Data Factory
  • Azure Data Lake Storage
  • Azure Databricks

In addition to a simple diagram in the solution architecture, we typically encounter a long list of requirements and assumptions regarding both the entire platform and its specific components. For example, we may need to comply with the following:

  • Limited locations where resources can be provisioned.
  • Resource naming convention.
  • Resource tagging strategy.

An example of requirements for a specific resource would be the list of specifications for the Azure Storage Account resource:

  • Resources are not exposed to the public internet.
  • Infrastructure encryption for encryption at rest is enabled.
  • The list of available storage account SKUs is limited.
  • Storage access keys are disabled.

Simply reading and adhering to these recommendations may seem straightforward. Unfortunately, as we have often observed in projects, the created environments frequently deviate from the assumptions. Two main factors typically contribute to these differences: taking shortcuts and bugs.

While topics such as tagging or public internet access can be quickly fixed, errors in settings such as encryption or incorrect naming may necessitate recreating the resource, which could prove costly and/or time-consuming later in the project.

How can we then protect ourselves from such situations? Is there a mechanism that will compel users to follow the rules we have established?

This is where Azure Policy comes to our aid – when properly configured, it enables us to enforce compliance with assumptions by applying rules defined by us.

Azure Policy

Before we delve into the implementation details, let’s look at basic definitions related to the Azure Policy service.

  • Azure Policy – a service that enables users to govern Azure resources by enforcing organizational standards and assessing compliance at scale.
  • Policy Definition – a JSON-defined object that describes a policy, including resource compliance requirements and the effect to take if they are violated.
  • Policy Assignment – a JSON-defined object that determines the resources to which a policy definition is applied.

To simplify – a definition describes the condition and effect, while the assignment specifies the scope on which they will apply. (Those interested can find comprehensive documentation at: What is Azure Policy?)

Let’s also explore the basic types of effects that will be discussed later in the article.

  • audit: generates a warning event in the activity log but doesn’t fail the request.
  • deny: generates an event in the activity log and fails the request based on requested resource configuration.
  • deployIfNotExists: deploys a related resource if it doesn’t already exist.
  • modify: adds, updates, or removes the defined set of fields in the request.

As you can see, the range of possible effects is quite wide – from purely informational (audit) to blocking actions or overriding them and independently creating additional resources. (The full list available here: Understand Azure Policy effects)

Policy Definition

Let’s now examine an example definition. This is one of the built-in definitions that enforce the inheritance of tags set at the resource group level. Even without prior experience with Azure Policy, it’s quite easy to understand the scope of its operation.

{
  "properties": {
    "displayName": "Inherit a tag from the resource group",
    "policyType": "BuiltIn",
    "mode": "Indexed",
    "description": "Adds or replaces the specified tag and value from the parent resource group when any resource is created or updated. (...).",
    "metadata": {
      "version": "1.0.0",
      "category": "Tags"
    },
    "parameters": {
      "tagName": {
        "type": "String",
        "metadata": {
          "displayName": "Tag Name",
          "description": "Name of the tag, such as 'environment'"
        }
      }
    },
    "policyRule": {
      "if": {
        "allOf": [
          {
            "field": "[concat('tags[', parameters('tagName'), ']')]",
            "notEquals": "[resourceGroup().tags[parameters('tagName')]]"
          },
          {
            "value": "[resourceGroup().tags[parameters('tagName')]]",
            "notEquals": ""
          }
        ]
      },
      "then": {
        "effect": "modify",
        "details": {
          "roleDefinitionIds": [
            "/providers/microsoft.authorization/roleDefinitions/b24988ac-6180-42a0-ab88-20f7382dd24c"
          ],
          "operations": [
            {
              "operation": "addOrReplace",
              "field": "[concat('tags[', parameters('tagName'), ']')]",
              "value": "[resourceGroup().tags[parameters('tagName')]]"
            }
          ]
        }
      }
    }
  },
  "id": "/providers/Microsoft.Authorization/policyDefinitions/cd3aa116-8754-49c9-a813-ad46512ece54",
  "type": "Microsoft.Authorization/policyDefinitions",
  "name": "cd3aa116-8754-49c9-a813-ad46512ece54"
}

The condition for invoking the definition is defined in lines 22-31, where both requirements must be met (allOf):

  • Our specified tag has a non-empty value at the resource group level.
  • The value of our specified tag is not equal to the value at the resource group level.

The effect of invoking the definition is defined in lines 33-47, which sets the correct tag on the resource.

An important question may arise here: does using Azure Policy require me to understand the syntax of definitions and define them independently using code? Fortunately, the answer is “no,” at least in the vast majority of cases. In the Azure Policy service, there is a long list of predefined and ready-to-use policies that do not require us to modify any JSON code. (Check: Azure Policy built-in policy definitions)

Here, we can find definitions that allow us to meet all the requirements regarding Azure Storage Accounts that I mentioned at the beginning of the article. These include:

  • Secure transfer to storage accounts should be enabled (404c3081-a854-4457-ae30-26a93ef643f9)
  • Storage accounts should be limited by allowed SKUs (7433c107-6db4-4ad1-b57a-a76dce0154a1)
  • Storage accounts should disable public network access (b2982f36-99f2-4db5-8eff-283140c09693)
  • Storage accounts should have infrastructure encryption (4733ea7b-a883-42fe-8cac-97454c2a9e4a)
  • Storage accounts should have the specified minimum TLS version (fe83a0eb-a853-422d-aac2-1bffd182c5d0)

And some requirements not related to a specific resource:

  • Allowed locations for resource groups (e765b5de-1225-4ba3-bd56-1ac6695af988)

But what if we can’t find the definition we’re interested in among the built-in ones? First, we can check the website https://www.azadvertizer.net/, which is a vast database of definitions written by the community. And if that doesn’t help… well, usually, it comes down to dealing with JSON 😊.

Implementation

How do we then implement the mentioned definitions?

There are several methods available, including through the Azure portal, Azure CLI, PowerShell, or using Infrastructure as Code (IaC) tools. In the examples below, Terraform code is utilized. For a single policy, the process is relatively straightforward. We simply assign it at the appropriate scope (Management Group, Subscription, Resource Group, Resource), providing the required parameters. For instance, when restricting allowed locations, we specify the list of permitted locations.

resource "azurerm_subscription_policy_assignment" "locations" {
    name                    =   "Allowed locations"
    
    policy_definition_id    =   "/providers/Microsoft.Authorization/policyDefinitions/e765b5de-1225-4ba3-bd56-1ac6695af988"
    display_name            =   "[POL]: Allowed locations"
    description             =   "Allowed locations"

    subscription_id         =   data.azurerm_subscription.current.id

    parameters = jsonencode(
        {
            listOfAllowedLocations = { value = ["westeurope","polandcentral"] }
        }
    )
}

However, this approach may prove inconvenient when dealing with a longer list of definitions. In such cases, Initiatives come to our rescue.

An Initiative, also known as a policy set, is a type of policy definition consisting of a collection of policy definition IDs. It’s used to centralize multiple policy definitions with a common goal, which can share parameters, identities, and be managed in a single assignment. Similar to policy definitions, the Azure Policy service has a set of predefined Initiatives. When these are not sufficient, we can define our own by grouping a list of definitions.

The following code defines a custom Initiative composed of five built-in definitions related to Azure Storage Account resources. Please note that the initiative defined below is in audit mode, so it will not block the creation of resources but will only mark them as non-compliant.

resource "azurerm_policy_set_definition" "azin-storage-inc" {
    name         = "[Data Platform] Storage Accounts"
    policy_type  = "Custom"
    display_name = "[Data Platform] Storage Accounts"

    parameters = jsonencode(
        {
            "listOfAllowedSKUs": {
                "type": "Array",
                "metadata": {
                    "description": "The list of SKUs that can be specified for storage accounts.",
                    "displayName": "Allowed SKUs",
                    "strongType": "StorageSKUs"
                }
            },
            "minimumTlsVersion": {
                "type": "String",
                "metadata": {
                    "displayName": "Minimum TLS Version",
                    "description": "Minimum version of TLS required to access data in this storage account"
                },
                "allowedValues": [
                    "TLS1_0",
                    "TLS1_1",
                    "TLS1_2"
                ],
                "defaultValue": "TLS1_2"
            }
        }
    )

    policy_definition_reference {
        # Secure transfer to storage accounts should be enabled
        policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/404c3081-a854-4457-ae30-26a93ef643f9"
        parameter_values     = jsonencode({
            effect = { value = "Audit" }
        })
    }

    policy_definition_reference {
        # Storage accounts should be limited by allowed SKUs
        policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/7433c107-6db4-4ad1-b57a-a76dce0154a1"
        parameter_values     = jsonencode({
            listOfAllowedSKUs = { value = "[parameters('listOfAllowedSKUs')]" }
            effect = { value = "Audit" }
        })
    }

    policy_definition_reference {
        # Storage accounts should disable public network access
        policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/b2982f36-99f2-4db5-8eff-283140c09693"
        parameter_values     = jsonencode({
            effect = { value = "Audit" }
        })
    }

    policy_definition_reference {
        # Storage accounts should have infrastructure encryption
        policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/4733ea7b-a883-42fe-8cac-97454c2a9e4a"
        parameter_values     = jsonencode({
            effect = { value = "Audit" }
        })
    }

    policy_definition_reference {
        # Storage accounts should have the specified minimum TLS version
        policy_definition_id = "/providers/Microsoft.Authorization/policyDefinitions/fe83a0eb-a853-422d-aac2-1bffd182c5d0"
        parameter_values     = jsonencode({
            minimumTlsVersion = { value = "[parameters('minimumTlsVersion')]" }
            effect = { value = "Audit" }
        })
    }
}

The initiative is assigned, just like a single definition.

resource "azurerm_subscription_policy_assignment" "azin-storage-inc-assignment" {
    name                    =   "basic principles for storage accounts in the data platform"
    
    policy_definition_id    =   azurerm_policy_set_definition.azin-storage-inc.id
    display_name            =   "[INC-DP]: Azure Storage Account related requirements"
    description             =   "-"

    subscription_id         =   data.azurerm_subscription.current.id

    parameters = jsonencode(
        {
            minimumTlsVersion = { value = "TLS1_2" }
            listOfAllowedSKUs = { value = [
                      "premium_lrs",
                      "standard_lrs",
                      "premium_zrs",
                      "standard_zrs",
                    ] }
        }
    )
}

And shortly after the deployment, its result is available in the Azure Portal.

As you can see, the report clearly indicates the non-compliance of resources created within the subscription. Fortunately, this is only my private sandbox environment ;).

Next Steps

In the next part, we will take a closer look at policies with modify and deployIfNotExist effects, which, in addition to ensuring compliance on the platform, will relieve us by automating some repetitive tasks.

Leave a Reply