SynapseVnetStorage_00

Connecting to Azure Storage from Synapse Analytics using Private Endpoint

Security is a very important aspect of every cloud-based project. Synapse Analytics ecosystem allows us to configure many different options including networking and connectivity. In this article, I will try to demonstrate how to set up a private endpoint from Synapse Analytics to external Data Lake storage.

First of all, I would like to explain what is Private Endpoint. It is a network interface that uses a private IP address from your virtual network. This network interface connects you privately and securely to a service that’s powered by Azure Private Link. When you enable a private endpoint, then you bring this specific service into your virtual network. It is important to understand that Private Link is a service that is giving us the possibility to establish private endpoints to PaaS services. In Synapse Analytics we are talking about Managed Private Endpoint – it means that it is managed by Synapse itself so you don’t need to bother about some configurations.

The first thing where you can find networking options for Synapse Analytics is the setup screen:

On the Networking tab, you can enable Managed Virtual Network, which means that your workspace will be associated with Azure Virtual Network. It will be managed by Microsoft and the entire management will be simplified for example you don’t have to create Network Security Groups or subnets for your spark clusters. As you can see on the above screen you have the option to create Managed Private Endpoint for your primary storage – and this is exactly what I wanted to do today.

First of all let’s open Synapse Studio:

Then we can go to Management tab where we will find Managed Private endpoints and click New to add it:

We can create Private Endpoints to many different services but for our testing purposes we will connect to Azure Data Lake Storage Gen2:

Connection setup is pretty easy, you just have to provide the name of your endpoint, description, and instance of a service that you want to connect to:

After creation, your connection will be in Provisioning state and you have to approve it on the storage account side:

To approve it go to storage account -> Networking -> Private endpoint connections and click Approve:

Additionally, you can provide a description for your approval to indicate why you are doing it etc.:

Our connection is ready and now we can go back to Firewalls and virtual networks tab of the storage networking setting. There we can set Public network access to Disabled so connectivity via Public Endpoint will be closed and the only way to get to your instance of storage will be via a private IP and private endpoint.

Let’s test if everything is working as expected. First of all, we have to give permission to our Synapse to connect to Storage.

Go to storage -> select Access Control (IAM) and click Add:

Select role that will be applicable in your case – for now, we will select Storage Blob Data Owner:

On the next screen we can select Managed Identity because this will be the way that we want to authenticate from Synapse:

After clicking Select members we can find managed identity associated with our Synapse Analytics:

Now we have needed permissions so we can go to Synapse Studio and create Linked service to the storage account:

When everything is working correctly we should be able to connect to it:

We can also try to query files that exist in this storage:

USE master
GO

CREATE DATABASE mydlake
GO

USE mydlake
GO

CREATE MASTER KEY

CREATE DATABASE SCOPED CREDENTIAL [AdlsMI]
WITH IDENTITY = 'MANAGED IDENTITY'
GO

CREATE EXTERNAL DATA SOURCE ExternalDataSourceDataLakeMI
WITH (
LOCATION   = 'https://sqmystorage.dfs.core.windows.net/data',
CREDENTIAL = AdlsMI
)

SELECT
*
FROM
OPENROWSET(
BULK 'Customers.txt',
DATA_SOURCE = 'ExternalDataSourceDataLakeMI',
FORMAT = 'CSV',
PARSER_VERSION = '2.0',
FIELDTERMINATOR = ',' ,
HEADER_ROW = TRUE
) AS r

As you can see everything works well and now we are interacting with storage in the most secure way using private endpoints and any additional connections are not allowed. To prove it try to connect to storage using the portal or Azure Storage explorer – if everything is set up correctly you should receive an error:

Managed Private endpoints are pretty easy to set up. You don’t have to worry about DNS, IP Addresses, or anything else because it is mostly managed by Synapse behind the scenes. With this functionality, you can fulfill very restricted networking requirements without huge effort. In my next articles I will try to show how to restrict access to Synapse workspace so stay tuned.

Leave a Reply