On the 8th of July 2020, Microsoft announced a range of new features coming to its product suite. This forms part of Microsoft’s 2020 release wave 2 – one of Microsoft’s 6 monthly updates which ensure continual enhancement are made to their products.
This article summaries the key updates to the Export to Data Lake functionality.
What is Export to Data Lake?
The Export to Data Lake functionality for the Common Data Service is a great way to enable the continuous replication of entity data to Azure Data Lake Storage. This data can then be used for anything from running analytics and feeding Power BI reports to powering machine learning algorithms. With a few clicks your Common Data Service will be seamlessly kept in sync to a Data Lake in your Azure subscription without the need for any additional actions.
With the 2020 release wave 2 Microsoft has added a range of new features users have been requiring. The update clearly marks the coming of age of the Export to Data Lake functionality.
Point in time data
Keeping track of how the entities across the Common Data Service change over time is critical to understanding how your organisation evolves and adapts to a changing landscape. In the past, achieving this required custom solutions to keep track of backups and integrate each point in time snapshot with the data that was already in place. With release wave 2, none of that will be necessary anymore. The 2020 release wave 2 will introduce solutions to:
- Export time series data
- Support exporting audit data
These changes will make it possible for data analysts and data scientists to capture all historical data for an entity, making it possible to build richer and more informative reports and enabling AI scenarios that require a lifetime view of the data.
The Common Data Service also supports an auditing capability where entity and attribute data changes within an organisation can be recorded over time for use for analysis and reporting purposes. Support for exporting entity audit data to Azure Data Lake Storage Gen2 will also be introduced.
Streamlined development experience
Release wave 2 will streamline the development experience for users of the Export to Data Lake functionality. The new features introduced to the Data Lake Export functionality are not just focused on making life easier for data scientists, data analysts and data engineers but will also welcome a series of improvements that will simplify the manipulation of the large datasets exported from the Common Data Service.
Integration with Azure Synapse
Azure Synapse is Microsoft’s solution to provide a limitless analytics service that scales as far as your data demands. Release wave 2 will bring together Azure Synapse SQL and the Data Lake Export functionality making it easier to explore and query the data residing in your Data Lake and better serving your data warehouse needs.
Export to Parquet format
Until now, the only export format supported for Export to Data Lake was CSV. Despite its convenience and ubiquitousness, CSV falls short of providing and efficient and highly performant storage format. Parquet is rapidly becoming one of the most popular binary formats to exchange data and with 2020 release wave 2 it will become one of the available formats for the Data Lake Export functionality.
Smart partitioning
While exporting data to Azure Data Lake, data is partitioned to improve efficiency in data consumption. However, the current partitioning strategy can result in files of very different sizes making it harder to achieve homogenous workloads when processing them. With release wave 2 comes a smart new partitioning strategy which will add more granular partitions, such as partitioning by month to increase the performance of backend applications.
Configurable snapshot intervals
Microsoft’s 2020 release wave 2 will also see the introduction of configurable snapshot intervals. Businesses will not need to adapt their data pipelines around the current one-hour intervals. Going forwards your organisation’s requirements can dictate the frequency of snapshots. This will allow your organisation to dictate what trade-off should be made with regards to compute and storage expenses versus time granularity.
Summary
This article has outlined just a few of the many new features coming to the Export to Data Lake functionality. Other features include:
- Cross-tenant support
- A new dashboard view to monitor count and visual trends of records
- Support for entities with attachments
- Update to the latest version of the Common Data Model.
To read the full range of updates, read the full release notes.
The range of new features coming to the Data Lake Export functionality will make it easier to create valuable analytics on the back of the data being held in the Common Data Service. This will valuably take many organisations much closer to realising the full potential of their data. If you would like to learn more about how Telefónica Tech can help you harness the power of your data, please get in touch today.