Often, when building data analytics solutions, we are asked to integrate with third-party tools that are not part of the Microsoft ecosystem. In this 3 part blog series, I’ll demonstrate how to send data to Salesforce and NetSuite using Azure Data Factory (ADF).

Salesforce is a Customer Relationship Management (CRM) solution that can integrate different departments, like marketing, sales or service, in a single shared view. Salesforce Marketing Cloud (SFMC) is a digital marketing platform that marketers can use to, for example, manage customer journey, data analytics and content creation and management.

Scenario

  • A SFMC administrator created a set of custom data extensions (objects equivalent to tables in SQL). You have to build a process in ADF that will select data from Azure SQL DB and ingest it to SFMC.

Considerations & Limitations

  • The copy activity in ADF contains a considerable list of connectors to Microsoft and third-party tools (list available here), however, at the time of writing this article, it is only possible to copy data from SFMC to any supported sink data store. The connector doesn’t support retrieving custom objects or custom data extensions (further details here). The alternative is to use the SFMC REST API calls
  • The body of the REST API is limited to 1MB, therefore, the data has to be split in batches. The number of rows in each batch depends on the number of columns of the object in Azure SQL DB and the length of the values
  • The APIs do not return which values were successfully posted and which ones failed. It is only possible to obtain the status of the job. Eg. Succeeded or Failed

Solution

The proposed pattern uses 3 ADF pipelines:

  • Marketing Cloud Parent
  • Marketing Cloud Group
  • Marketing Cloud Child

Marketing Cloud Parent

*Lookup Group Number Procedures – Obtains the list of stored procedures that calculates how many groups each table should have

*ForEach Group Number Procedure – Contains a Stored Procedure activity that executes the objects obtained in the previous step. The data is stored in GroupNumber tables. (Eg, OrderGroupNumber, CustomerGroupNumber)

*Lookup Salesforce Procedures – Obtains a list with two types of stored procedures. One receives the group number as a parameter, split the data into small chunks and converts it to JSON format (SPA) and the other updates the records that have been consumed (SPB)

*ForEach Salesforce Procedure – Contains an Execute Pipeline activity that invokes the Marketing Cloud Group pipeline. The batch count can be changed to increase/decrease the number of concurrent executions

Marketing Cloud Group

*Lookup Group Number – Obtains the number of groups stored in the GroupNumber table

*ForEach Group Number – Contains an Execute Pipeline activity that invokes the Marketing Cloud Child pipeline. The batch count can be changed to increase/decrease the number of concurrent executions

Marketing Cloud Child

*Obtain Access Token To use the API functionality the source application integration must obtain an access token from the SFMC authentication API. This access token is then used in an authentication header in subsequent calls. Find here the Access Token Request details

*Lookup Salesforce ProcedureExecutes the stored procedures of type SPA

*LOAD data async Because of the data volumes, calls should use the asynchronous API to upload data to SFMC. The data is stored in data tables called data extensions. A general description of the asynchronous APIs for loading data into data extensions is available here.

*Until Request Status Complete –  Because we are running an asynchronous process, we need a mechanism to check when the job completes before moving to the next activity. To do so, we have an Until activity with two activities. The execution proceeds once the result of the Get Job Status web activity is Complete

**Wait 5s – Wait 5 seconds until it proceeds to the next activity

**Get Job Status – Returns the status of an asynchronous job using the request Id from the call in the LOAD data async web activity. Find the API details here

*Check hasErrors – Uses the same API as above but this time checks the property hasErrors

*If hasErrorsThis activity has two outcomes. If hasErrors is true, checks the results of the job and records the errors. If hasErrors is false, executes a Stored Procedure activity that marks the records as consumed (SPB)

**GET Request ResultsReturns the results of an asynchronous job using the request Id from the call in the LOAD data async web activity. Find the API details here

**Filter non-errors – Only selects the records where the status is Error

**Insert Errors – Inserts the errors in an error table in Azure SQL DB