Years ago I stumbled on a great blog about SSIS best practices and naming conventions. In the years to come I could refer back to this at the start of every SSIS project. Written by Jamie Thomson, this has become the standard, and although there were variants, Jamie’s still remains very popular (Jamie Thompson, Link).
I want to propose a list of best practices and naming conventions for the various elements of ADF based around my experience with ADF. This list may change over time as I discover what works and what does not. At present the standard names generated by ADF are generic and not very helpful (LinkedServiceAzureSQLDataWareouse1 etc). In light of this I propose the following best practices and standards for naming ADF elements.
Best practices
- Use a standard naming convention. For a large ADF project you can end up with 300-400 json objects. It is tricky to navigate that many objects and deploying from visual studio makes this even harder. Having a standard helps to quickly locate which json object needs to be changed.
- When you have a solution with a lot of JSON objects (a few hundred or more) publishing from visual studio becomes an issue. A bottleneck begins to appear, which is brought on by each element needing to validate against it peers. When you have hundreds of elements deploying new/updated pipelines adds a significant overhead. In my recent example I had 222 JSON objects and 74 pipelines, when initially deploying, each pipeline (after solution validation) took ~10 seconds each. Once published and updated this increased to ~40 seconds per pipeline.
74×10 is approximately 12:20 minutes (13 minutes in total)
74×40 is approximately 49:20 minutes (gave up after 40 minutes)
This can be resolved by adding a step in to your automation which removes your ADF instance before publishing a large amount of changes (see point 3 – Automate with Po$h)
- Don’t neglect PowerShell. PowerShell is a great way to automate the development of json objects – especially when you need to create a few hundred! I have a blog coming in the next few weeks detailing how to do this.
* More best practices will be added.
Naming conventions
There are a few standard naming conventions which apply to all elements in Azure Data factory.
- Object names must start with a letter or a number, and can contain only letters, numbers, and the dash (-) character.
- Maximum number of characters in a table name: 260.
- Object names must start with a letter number, or an underscore (_).
- Following characters are not allowed: “.”, “+”, “?”, “/”, “<”, ”>”,”*”,”%”,”&”,”:”,””
(Microsoft, 2017 [https://docs.microsoft.com/en-us/azure/data-factory/data-factory-naming-rules])
Linked services & Datasets.
A linked service connects data from a source to a destination (sink), it stands to reason that there would therefore be the same for a data set. Rather than having two separate lists, you will see the below table has a column for Linked services and datasets.
Type |
Linked Service |
Name |
Linked Service |
Dataset |
Full |
Azure |
ABLB_ |
LS_ABLB_ |
DS_ABLB_ |
LS_ABLB_Example |
|
|
ADLS_ |
LS_ADLS_ |
DS_ADLS_ |
LS_ADLS_Example |
|
|
ASQL_ |
LS_ASQL_ |
DS_ASQL_ |
LS_ASQL_Example |
|
|
ASDW_ |
LS_ASDW_ |
DS_ASDW_ |
LS_ASDW_Example |
|
|
ATBL_ |
LS_ATBL_ |
DS_ATBL_ |
LS_ATBL_Example |
|
|
ADOC_ |
LS_ADOC_ |
DS_ADOC_ |
LS_ADOC_Example |
|
|
ASER_ |
LS_ASER_ |
DS_ASER_ |
LS_ASER_Example |
|
Databases |
MSQL_ |
LS_SQL_ |
DS_SQL_ |
LS_SQL_Example |
|
|
ORAC_ |
LS_ORAC_ |
DS_ORAC_ |
LS_ORAC_Example |
|
|
MYSQ_ |
LS_MYSQ_ |
DS_MYSQ_ |
LS_MYSQ_Example |
|
|
DB2_ |
LS_DB2_ |
DS_DB2_ |
LS_DB2_Example |
|
|
TDAT_ |
LS_TDAT_ |
DS_TDAT_ |
LS_TDAT_Example |
|
|
POST_ |
LS_POST_ |
DS_POST_ |
LS_POST_Example |
|
|
SYBA_ |
LS_SYBA_ |
DS_SYBA_ |
LS_SYBA_Example |
|
|
CASS_ |
LS_CASS_ |
DS_CASS_ |
LS_CASS_Example |
|
|
MONG_ |
LS_MONG_ |
DS_MONG_ |
LS_MONG_Example |
|
|
ARED_ |
LS_ARED_ |
DS_ARED_ |
LS_ARED_Example |
|
File |
FILE_ |
LS_FILE_ |
DS_FILE_ |
LS_FILE_Example |
|
|
HDFS_ |
LS_HDFS_ |
DS_HDFS_ |
LS_HDFS_Example |
|
|
AMS3_ |
LS_AMS3_ |
DS_AMS3_ |
LS_AMS3_Example |
|
|
FTP_ |
LS_FTP_ |
DS_FTP_ |
LS_FTP_Example |
|
Others |
SAFC_ |
LS_SAFC_ |
DS_SAFC_ |
LS_SAFC_Example |
|
|
ODBC_ |
LS_ODBC_ |
DS_ODBC_ |
LS_ODBC_Example |
|
|
ODAT_ |
LS_ODAT_ |
DS_ODAT_ |
LS_ODAT_Example |
|
|
WEBT_ |
LS_WEBT_ |
DS_WEBT_ |
LS_WEBT_Example |
|
|
GEHI_ |
LS_GEHI_ |
DS_GEHI_ |
LS_GEHI_Example |
Pipelines
Pipelines are slightly different in that there are two different types. The copy data activity and the data transformation activity.
Type |
Name |
Action |
Example |
Data movement Activity |
PL_DATA_ |
NA |
PL_DATA_DS_SQL_Person_To_DS_ABLB_Person |
Data transformation pipeline |
PL_TRAN_ |
SPRC – Stored Procedure |
PL_TRAN_SPRC_CleanDimAccount |
|
PL_TRAN_ |
DNET – Script |
PL_TRAN_DNET_AggregateSales |
|
PL_TRAN_ |
ADLK – Azure Data Lake |
PL_TRAN_ADLK_AggregateSales |
|
PL_TRAN_ |
HIVE – Hive |
PL_TRAN_HIVE_AggregateSales |
|
PL_TRAN_ |
PIG – Pig |
PL_TRAN_PIG_AggregateSales |
|
PL_TRAN_ |
MAPR – MapReduce |
PL_TRAN_MAPR_AggregateSales |
|
PL_TRAN_ |
HADP – Hadoop Stream |
PL_TRAN_HADP_StreamData |
|
PL_TRAN_ |
AML – Azure Machine Learning |
PL_TRAN_AML_CalculateMonthlyChurn |