Azure Data Factory - Suggested naming conventions and best practices.

Years ago I stumbled on a great blog about SSIS best practices and naming conventions. In the years to come I could refer back to this at the start of every SSIS project. Written by Jamie Thomson, this has become the standard, and although there were variants, Jamie’s still remains very popular (Jamie Thompson, Link).

I want to propose a list of best practices and naming conventions for the various elements of ADF based around my experience with ADF. This list may change over time as I discover what works and what does not. At present the standard names generated by ADF are generic and not very helpful (LinkedServiceAzureSQLDataWareouse1 etc). In light of this I propose the following best practices and standards for naming ADF elements.

Best practices

Use a standard naming convention. For a large ADF project you can end up with 300-400 json objects. It is tricky to navigate that many objects and deploying from visual studio makes this even harder. Having a standard helps to quickly locate which json object needs to be changed.
When you have a solution with a lot of JSON objects (a few hundred or more) publishing from visual studio becomes an issue. A bottleneck begins to appear, which is brought on by each element needing to validate against it peers. When you have hundreds of elements deploying new/updated pipelines adds a significant overhead. In my recent example I had 222 JSON objects and 74 pipelines, when initially deploying, each pipeline (after solution validation) took ~10 seconds each. Once published and updated this increased to ~40 seconds per pipeline.

74×10 is approximately 12:20 minutes (13 minutes in total)

74×40 is approximately 49:20 minutes (gave up after 40 minutes)

This can be resolved by adding a step in to your automation which removes your ADF instance before publishing a large amount of changes (see point 3 – Automate with Po$h)

Don’t neglect PowerShell. PowerShell is a great way to automate the development of json objects – especially when you need to create a few hundred! I have a blog coming in the next few weeks detailing how to do this.

* More best practices will be added.

Naming conventions

There are a few standard naming conventions which apply to all elements in Azure Data factory.

Object names must start with a letter or a number, and can contain only letters, numbers, and the dash (-) character.
Maximum number of characters in a table name: 260.
Object names must start with a letter number, or an underscore (_).
Following characters are not allowed: “.”, “+”, “?”, “/”, “<”, ”>”,”*”,”%”,”&”,”:”,””

(Microsoft, 2017 [https://docs.microsoft.com/en-us/azure/data-factory/data-factory-naming-rules])

Linked services & Datasets.

A linked service connects data from a source to a destination (sink), it stands to reason that there would therefore be the same for a data set. Rather than having two separate lists, you will see the below table has a column for Linked services and datasets.

Type	Linked Service	Name	Linked Service	Dataset	Full
Azure	Azure Blob storage	ABLB_	LS_ABLB_	DS_ABLB_	LS_ABLB_Example
	Azure Data Lake Store	ADLS_	LS_ADLS_	DS_ADLS_	LS_ADLS_Example
	Azure SQL Database	ASQL_	LS_ASQL_	DS_ASQL_	LS_ASQL_Example
	Azure SQL Data Warehouse	ASDW_	LS_ASDW_	DS_ASDW_	LS_ASDW_Example
	Azure Table storage	ATBL_	LS_ATBL_	DS_ATBL_	LS_ATBL_Example
	Azure DocumentDB	ADOC_	LS_ADOC_	DS_ADOC_	LS_ADOC_Example
	Azure Search Index	ASER_	LS_ASER_	DS_ASER_	LS_ASER_Example
Databases	SQL Server*	MSQL_	LS_SQL_	DS_SQL_	LS_SQL_Example
	Oracle*	ORAC_	LS_ORAC_	DS_ORAC_	LS_ORAC_Example
	MySQL*	MYSQ_	LS_MYSQ_	DS_MYSQ_	LS_MYSQ_Example
	DB2*	DB2_	LS_DB2_	DS_DB2_	LS_DB2_Example
	Teradata*	TDAT_	LS_TDAT_	DS_TDAT_	LS_TDAT_Example
	PostgreSQL*	POST_	LS_POST_	DS_POST_	LS_POST_Example
	Sybase*	SYBA_	LS_SYBA_	DS_SYBA_	LS_SYBA_Example
	Cassandra*	CASS_	LS_CASS_	DS_CASS_	LS_CASS_Example
	MongoDB*	MONG_	LS_MONG_	DS_MONG_	LS_MONG_Example
	Amazon Redshift	ARED_	LS_ARED_	DS_ARED_	LS_ARED_Example
File	File System*	FILE_	LS_FILE_	DS_FILE_	LS_FILE_Example
	HDFS*	HDFS_	LS_HDFS_	DS_HDFS_	LS_HDFS_Example
	Amazon S3	AMS3_	LS_AMS3_	DS_AMS3_	LS_AMS3_Example
	FTP	FTP_	LS_FTP_	DS_FTP_	LS_FTP_Example
Others	Salesforce	SAFC_	LS_SAFC_	DS_SAFC_	LS_SAFC_Example
	Generic ODBC*	ODBC_	LS_ODBC_	DS_ODBC_	LS_ODBC_Example
	Generic OData	ODAT_	LS_ODAT_	DS_ODAT_	LS_ODAT_Example
	Web Table (table from HTML)	WEBT_	LS_WEBT_	DS_WEBT_	LS_WEBT_Example
	GE Historian*	GEHI_	LS_GEHI_	DS_GEHI_	LS_GEHI_Example

Pipelines

Pipelines are slightly different in that there are two different types. The copy data activity and the data transformation activity.

Type	Name	Action	Example
Data movement Activity	PL_DATA_	NA	PL_DATA_DS_SQL_Person_To_DS_ABLB_Person
Data transformation pipeline	PL_TRAN_	SPRC – Stored Procedure	PL_TRAN_SPRC_CleanDimAccount
	PL_TRAN_	DNET – Script	PL_TRAN_DNET_AggregateSales
	PL_TRAN_	ADLK – Azure Data Lake	PL_TRAN_ADLK_AggregateSales
	PL_TRAN_	HIVE – Hive	PL_TRAN_HIVE_AggregateSales
	PL_TRAN_	PIG – Pig	PL_TRAN_PIG_AggregateSales
	PL_TRAN_	MAPR – MapReduce	PL_TRAN_MAPR_AggregateSales
	PL_TRAN_	HADP – Hadoop Stream	PL_TRAN_HADP_StreamData
	PL_TRAN_	AML – Azure Machine Learning	PL_TRAN_AML_CalculateMonthlyChurn