What are Integration Runtimes?
An Integration Runtime (IR) is the compute infrastructure used by Azure Data Factory to provide data integration capabilities such as Data Flows and Data Movement. It has access to resources in either public networks, or hybrid scenarios (public and private networks).
Integration Runtimes are specified in each Linked Service, under Connections.
There are 3 types to choose from.
Azure Integration Runtime is managed by Microsoft. All the patching, scaling and maintenance of the underlying infrastructure is taken care of. The IR can only access data stores and services in public networks.
Self-hosted Integration Runtimes use infrastructure and hardware managed by you. You’ll need to address all the patching, scaling and maintenance. The IR can access resources in both public and private networks.
Azure-SSIS Integration Runtimes are VMs running the SSIS engine which allow you to natively execute SSIS packages. They are managed by Microsoft. As a result, all the patching, scaling and maintenance is taken care of. The IR can access resources in both public and private networks.
Integration Runtime Scenarios
- Azure automatically provisions an integration Runtime which can connect to Azure resources (Azure SQL, Azure Synapse Analytics, Storage Accounts) without any issues.
- You can perform data integration securely in a private network environment, shielded from the public cloud environment. For that you need to install a self-hosted IR inside your virtual private network. The self-hosted integration runtime only makes outbound HTTP-based connections to open internet.
- You can also perform data integration securely in an on prem environment. For that you need to install a Self-hosted IR behind your corporate firewall in your on prem environment.
- You can natively execute SSIS Packages by creating an Azure-SSIS Integration Runtime which creates an Integration Services Catalog in Azure SQL Database where the packages are stored. An ADF pipeline run sends commands to the Azure SSIS IR which executes the SSIS Packages.
Are Integration Runtimes Secure?
Data Store Credentials
On-premise data store credentials can either be stored within Data Factory or be referenced by Data Factory via Key Vault at runtime. Storing credentials within Data Factory means they are always stored and encrypted on the Self-hosted IR machine.
Storing credentials locally can be done with or without flowing credentials through Azure backend service to the Self-hosted IR machine. Both options allow secure encryption.
Encryption in Transit
All data transfers are via secure channel HTTPS and TLS over TCP to prevent man-in-the-middle attacks during communication with Azure services.
You can also use IPSec VPN or Azure ExpressRoute to further secure the communication channel between your on-premises network and Azure.
Virtual Network Service Endpoint
Using Virtual Network Service Endpoints to restrict SQL DB access to only the specified Virtual Network (VNet) adds an extra layer of security. Service Endpoints enables private IP addresses in the VNet to reach the endpoint of an Azure service without needing a public IP address on the VNet.
Once you enable service endpoints in your VNet, you can add a VNet rule to secure the Azure service resources to your VNet. The rule provides improved security by fully removing public internet access to resources and allowing traffic only from your VNet.
In order for the Azure-SSIS IR to access the SQL Database, it needs to be joined to the same VNet and Subnet as illustrated by the above diagram (scenario 4). In this way, only this Subnet can access the SQL Database.
With that in place, turning off “Allow Azure Services to Access Server” is the next step as both the IR and the Azure SQL DB now operate within the context of a VNet and can communicate with private IP addresses which is more secure.
In this blog we’ve looked at the 3 integration runtimes. We’ve also examined how they can be made secure. Thank you for your attention.