Ingesting data from the Adobe Analytics API

Ingesting data on a schedule from the Adobe Analytics API and wanted to share my experience – maybe I’ll save you some of the headaches I had.

It can be a time-consuming process. To avoid headaches and drawbacks, this blog has everything you need to know.

This two-part blog series is geared towards technical specialists (data architects, data engineers and similar job titles and the like) that need to ingest data from the Adobe Cloud via the Adobe APIs. The outcome after it is that it will be clear what you need to do to ingest data, and feel comfortable diving into the job.

Part one will detail the requirements that you need to action to ingest data. Whereas, part 2 will detail specific technical implementation of the concepts.

Prerequisites and necessary components

In this section is a list of prerequisites and explanations of what needs to be done and by who. This guide assumes you have chosen to authenticate using an OAuth token generated by passing a JSON Web Token to the API:

  • Adobe Administrator needs to give you developer access.
  • Adobe Administrator needs to give you access over the report suite / objects you will be querying.
  • You need to create an API client through the Adobe Developer Console.
  • You need to get the following from the API client and my recommendation is to store them in a credential store of some sorts:
    1. Client Id
    2. Client Secret
    3. Technical Account Id
    4. Organisation Id
    5. Generate (or bring your own and import it in the service) a private/public key pair.

Specific documentation for the steps 1-3 can be found on the Adobe website – https://developer.adobe.com/analytics-apis/docs/2.0/guides/#permissions.

Detailed documentation for the JWT generation can be found here – https://developer.adobe.com/developer-console/docs/guides/authentication/JWT/.

Other authentication options can be found below, though those will not be covered in this blog – https://developer.adobe.com/developer-console/docs/guides/authentication/.

  • You need some sort of compute that can run the libraries used to encode the JSON Web Token (JWT). A list of libraries and programming languages supported can be found here – https://developer.adobe.com/developer-console/docs/guides/authentication/JWT/#using-jwt-libraries-and-creation-tools.
  • You need some sort of compute (can be the same one as above) that can call the Adobe API and submit the JWT to get back an OAuth token used for authentication.
  • You need some sort of compute (can be the same one as above) that can call the Adobe API, authenticate with the OAuth token and store the resultant output somewhere. Sidenote – I recommend storing the raw output of your queries in the RAW area of your data lake and only cleaning / otherwise transforming that in subsequent layers. That way you always have an accurate representation of the data as it was in the source at a particular point in time and you can evolve your queries or transformation logic safely as you can always recover / reprocess from the RAW data.
  • You need some sort of storage where you will store the output of your queries to the Adobe API.
  • Not mandatory but will save you a lot of time – ask the Adobe power user / administrator / person from the business asking you for the integration to:
    1. create a project with sample reports with the data points you need
    2. give you access to that project.

 

You can use these to get the report suite ids and metric/dimension names needed later in the process. Your counterparts can provide text comments in that project that you can use for guidance.

 

Exploring the Adobe Analytics API and preparing API calls

This section will detail how you need to explore the Adobe API to get some system names that are used when you query the API, and how you can see what is available in it. For this, I used a desktop API platform that allows me to query the API and store queries along with authorisation information, headers, and other configuration necessary. This way I could quickly explore the API before I put the results in code.

 

  • Exchanging JWT for OAuth token at https://ims-na1.adobelogin.com/ims/exchange/jwt – you need the client id, client secret and encoded JWT in the header section to do this. You pass them as headers with the names: client_id, client_secret, jwt_token. The request is a POST one.

 

You will get back an OAuth token with its validity in Unix epoch time.

 

You can either encode your JWT using code, or if you do not want to commit to running any compute for this yet, there is a web JWT generator inside the Adobe platform where you pass the private key from your key pair and it will return an encoded JWT for you (it automatically picks up the rest of the information from your developer account). The other option is for you to use a web tool such as https://jwt.io/. Once again, I advise you look into this piece of documentation – https://developer.adobe.com/developer-console/docs/guides/authentication/JWT/.

 

In any case, to encode the JWT you will need the following information:

  1. Expiry time in Unix epoch time – generate this programmatically in your final solution
  2. Organisation Id
  3. Technical Account Id
  4. Client Id
  5. Scope – essentially which API you will be querying. Pick from a list at https://developer.adobe.com/developer-console/docs/guides/authentication/JWT/Scopes/

 

  • Obtaining your company id at https://analytics.adobe.io/discovery/me – you pass the organisation id as x-proxy-global-company-id and the client id as x-api-key. Authorization is Bearer and you provide the OAuth token received from the previous step. This is a GET request.

 

You will get back a collection of companies that you can access with your account. What you need from that output is the globalCompanyId of the company whose data you are querying.

 

That is not the same as your organisation id, and you should be careful not to get confused between the two. All API calls below require the globalCompanyId as part of the endpoint you are querying and the organisation id in the headers as a header called x-proxy-global-company-id. Intuitive.

 

  • Explore report suites at https://analytics.adobe.io/api/{globalCompanyId}/collections/suites – substitute {globalCompanyId} with the value obtained in the previous step. You pass the organisation id as x-proxy-global-company-id and the client id as x-api-key. Authorization is Bearer and you provide the OAuth token received from the first step. This is a GET request.

 

This will return a collection of report suites you can access and their system names in the rsid field, and their display names in the name field. Record the rsid of the suite/s you are interested in.

 

You can find these either by asking your Adobe counterpart (more likely to be the person from the business / Adobe data analyst than an Adobe administrator), or by looking at the sample reports if you have these – the report suite name is in the top right corner of each report in the project. Translate the report suite names into rsids by looking at the API call output.

 

  • Explore metrics at https://analytics.adobe.io/api/{globalCompanyId}/metrics?rsid={rsid}&locale=en_US – substitute {globalCompanyId} with the value obtained earlier, and {rsid} with the value obtained in the previous step. You can change the locale to a more appropriate one for your use case. You will need multiple calls if you have multiple rsids. You pass the organisation id as x-proxy-global-company-id and the client id as x-api-key. Authorization is Bearer and you provide the OAuth token received from the first step. This is a GET request.

 

This will return a collection of metrics that exist in that report suite that you can access. You can then locate the ones you need by looking at the title – that is the display name of the metric as seen in the reports in the Adobe Workspace project you have or have otherwise elicited from your counterparts. What you will need for the final API call is the id of the metrics.

 

What I personally did is call this once for every rsid I need and store the output in text files on my local machine. Then I could explore those at will without calling the API all the time.

 

  • Explore dimensions at https://analytics.adobe.io/api/{globalCompanyId}/dimensions?rsid={rsid}&locale=en_US – substitute {globalCompanyId} with the value obtained earlier, and {rsid} with the value obtained in the previous step. You can change the locale to a more appropriate one for your use case. You will need multiple calls if you have multiple rsids. You pass the organisation id as x-proxy-global-company-id and the client id as x-api-key. Authorization is Bearer and you provide the OAuth token received from the first step. This is a GET request.

 

This will return a collection of dimensions that exist in that report suite that you can access. You can then locate the ones you need by looking at the title – that is the display name of the dimension as seen in the reports in the Adobe Workspace project you have or have otherwise elicited from your counterparts. What you will need for the final API call is the id of the dimensions.

 

What I personally did is call this once for every rsid I need and store the output in text files on my local machine. Then I could explore those at will without calling the API all the time.

 

  • Generate a report request body in JSON – you need the rsid/s, metric and dimension ids for this step. You need to craft a certain JSON that will be the request body. The formatting is outlined in the below sample report request body:

{

“rsid”:{rsid},

“globalFilters”:[

{

“type”:”dateRange”,

“dateRange”:”2021-12-31T00:00:00.000/2022-01-01T23:59:59.999″

}

],

“metricContainer”:{

“metrics”:[

{

“columnId”:”0″,

“id”:”metrics/pageviews”,

“filters”:[

“0”

]

}

],

“metricFilters”:[

{

“id”:”0″,

“type”:”dateRange”,

“dateRange”:”2021-12-31T00:00:00.000/2022-01-01T23:59:59.999″

}

]

},

“dimension”:”variables/daterangeday”,

“settings”:{

“dimensionSort”:”asc”

}

}

You can submit multiple metrics in a query by adding them to the metrics collection. I would advise you to parameterise the dateRange filters and programmatically generate these and use an incremental load pattern for your data ingestion.

  • At last, you are ready to query https://analytics.adobe.io/api/{globalCompanyId}/reports/ – substitute {globalCompanyId} with the value obtained earlier. You pass the organisation id as x-proxy-global-company-id and the client id as x-api-key. Authorization is Bearer and you provide the OAuth token received from the first step. Submit the JSON from the previous step as the request body. This is a POST request.

It will return something similar to:

{

“totalPages”: 1,

“firstPage”: true,

“lastPage”: true,

“numberOfElements”: 2,

“number”: 0,

“totalElements”: 2,

“columns”: {

“dimension”: {

“id”: “variables/daterangeday”,

“type”: “time”

},

“columnIds”: [

“0”

]

},

“rows”: [

{

“itemId”: “1111111”,

“value”: “Dec 31, 2021”,

“data”: [

420.0

]

},

{

“itemId”: “2222222”,

“value”: “Jan 1, 2022”,

“data”: [

9001.0

]

}

],

“summaryData”: {

“filteredTotals”: [

69.0

],

“totals”: [

1337.0

]

}

}

 

You can submit as many of these as you need to get all the data you need.

You can find detailed documentation on all of the above queries, including a Swagger UI at the following Adobe links:

https://adobedocs.github.io/analytics-2.0-apis/#/ – Swagger.

https://github.com/AdobeDocs/analytics-2.0-apis/blob/master/reporting-guide.md – Reporting Guide (that git repo has other useful documentation, I advise you take a look at anything of interest).

https://developer.adobe.com/analytics-apis/docs/2.0/guides/endpoints/ – short guides on how to use each endpoint.

  • Store the output of your query in your storage of choice – entirely up to you.
  • Transform the RAW output of your query to something meaningful and easily useable by your analyst teams