The European Microsoft Fabric Conference was held in Stockholm between the 24th and 27th September and for some, it was the first opportunity to hear about the product and interact directly with the Fabric Product Team. Telefónica Tech was a Silver sponsor and it was a great opportunity to meet likeminded professionals, partners and current and potential future customers. Throughout the 4 days, we had the opportunity to speak with a lot of Fabric enthusiasts and to hear directly from the senior product team what lies ahead. In this blog I’ll highlight what were my favourite announcements (and one or other disappointment), however, you can check out the full list here and here.
Service Principal Support for Fabric APIs
Up until now, the majority of the REST APIs only supported user authentication. This was a clear limitation, since if we wanted to call the REST APIs via Azure DevOps, MFA would need to be disabled and that would go against the security principles of many organisations. From this month, the support of service principles has been extended to additional APIs. This is a substantial step towards an enterprise-grade platform, where repeatable processes can be automated and efficiency and accuracy is improved by removing manual effort and intervention.
Terraform Provider for Microsoft Fabric
Any provider aiming to be the destination for small and enterprise customers looking to build a modern data platform should support Infrastructure as Code (IaC). The new Terraform provider will give the ability to automate, scale and simplify the deployment and management processes. Up until now, the deployment of Fabric items was only possible via the REST APIs and required manual intervention or some creativity to use Azure DevOps pipelines. With the Terraform provider, customers can start to move away from ClickOps, adopt industry standards and leverage Azure DevOps pipelines and Git integration for the deployment of the Fabric artefacts.
New Design for Fabric Deployment Pipelines
Fabric Deployment Pipelines are Microsoft’s recommendation for the release of the artefacts to upper environments, however, in my opinion, these are slightly disappointing. Once again, Microsoft applied UI changes to the pipelines but still did not address some of the core limitations (e.g. inability to update pipeline parameters and connections, easily select the Git branch from which the deployment should be triggered, add or remove deployment stages without having to create a new pipeline). One can only wait for Microsoft to come up with a more efficient approach so that the release process can be more efficient, scalable and without manual configuration, in parity with DevOps Pipelines.
Azure Databricks Unity Catalog Mirroring
This feature created a lot of excitement in the community. Many people have been asking how to combine Azure Databricks with Microsoft Fabric. Previously, Microsoft recommended to use OneLake as the storage layer for Databricks, however, that had some limitations (e.g. OneLake couldn’t be used as metastore-level managed storage for Unity Catalog). To reduce the data movement, Microsoft will now support the replication of Unity Catalog managed tables. Once that is setup, whenever a new table is created, it will become automatically available in the Lakehouse. This however, has still some limitations (e.g. Azure Databricks workspaces behind a private endpoint are not supported, tables with RLS, streaming tables or views are not supported). Before embarking in this journey, it is also important to understand that the security model applied to the Unity Catalog is not replicated to OneLake. As a result, there will be a need to support two separate permission models.
OneLake Shortcuts to Iceberg
Following the announced partnership with Snowflake, Microsoft will soon allow Snowflake users to write Iceberg data to OneLake via Shortcuts. In the background, Fabric will virtualize the Iceberg tables into Delta Lake tables, allowing the Fabric engines to consume the data with no movement or duplication. This is another great step towards allowing Fabric to consume data stored in different analytical engines without additional costs, since no data movement is required.
OneSecurity
One of the promises of Fabric OneSecurity is to unify the security across all the different engines (Spark, Warehouse, KQL, Power BI), rather than having to manage an individual security model per analytical engine. This is a much awaited feature however it proved to be more challenging than originally expected by Microsoft and as a result, the release date has been pushed towards the end of the year.
Eventstream’s Integration with Managed Private Endpoint
When Managed Private Endpoints are enabled in a workspace, there are a series of features that are not supported (e.g. OneLake Shortcuts, Mirroring) and other limitations are applied (e.g. only the spark engine can be used to copy data to OneLake, either via Notebooks or Dataflows Gen2). It’s good to see that Microsoft has been listening to the feedback and slowly is removing the limitations when this security feature is enabled. The newest addition is the ability to use Fabric Eventstreams to ingest real-time streaming data from Azure services, such as Event Hub, that are protected by private endpoints.
Database Migration Experience
This is a great opportunity for customers using SQL Server or Synapse Dedicated SQL Pools and looking to migrate to Fabric. According to Microsoft, with this tool, users will be able to seamlessly migrate the code and the data to Fabric Data Warehouse without having to perform manual operations.
Nested Common Table Expression
Sometimes it’s the small things that make a difference. There are occasions where NCTEs make a difference between writing 10 lines of code or 50 lines to achieve the same result. It might not be something you have to use very often, but you will welcome this new addition when the time comes.
Large Data Types in Fabric Warehouse
Although the data is stored in a delta format, string or binary fields were still limited to 8k per cell. From October, these limits will be removed, enabling the storage of large strings and binary values in Fabric Data Warehouse. With this change, users will be able to use VARCHAR(MAX) and VARBINARY(MAX) types to declare columns that should contain more than 8KB of data. Although Microsoft has introduced several performance improvements, you should still look to adopt best practices by optimising data types and minimise them to match the largest possible value in the column.
High Concurrency mode for Notebooks in Pipelines
Another important feature added to Fabric. Prior to this, every spark notebook triggered by a pipeline would create a new Livy session, causing delays to the process since the sessions would be added to a queue and only a certain number could be executed in parallel. This would be particularly painful if managed private endpoints were enabled in the workspace, since the starter pools cannot be used and each session would take 3-5 minutes to start. With high-concurrency, the same session can be re-used across different notebooks (only 5 notebooks can be joined in the same session) and therefore, reducing the waiting time and improving efficiency. Users can also target specific notebooks to specific high-concurrency sessions by using session tags.
Data Factory Data Pipeline Triggers
In this day and age, monolithic data platforms are no longer acceptable. Customers require agile platforms, where the data is made available as soon as possible and dependency management is easily achievable. At this stage, data pipelines can be triggered in two ways: 1) a schedule, 2) via events produced by an Azure blob storage or by workspace items in Fabric. Unfortunately, these are still fairly limited. When scheduling a pipeline, it is not possible to define parameter values. As a result, if you need to ingest data from multiple data sources, you may have to create multiple pipelines with hardcoded values rather than multiple triggers, in line with what you would do in Azure Data Factory. Building a workflow based on an event driven process is either not possible or very limited. Microsoft did not provide any additional details in this area, but hopefully we will soon see this feature mature and see an extended support to other services, such as Azure Event Grid.
Native Execution Engine on Runtime 1.3
The native execution engine was completely rewritten in C++, operates in columnar mode and utilises vectorised processing. Now with the support to the latest GA runtime version, it will continue to provide significant performance improvements across data processing, ETL, data science, and interactive queries. To leverage the new engine, you just need to activate it through the environment settings or selectively for an individual notebook or job.
Power BI Dark Mode
Dark mode is now available in Power BI Desktop. Many great announcements were made at Arun’s keynote, but I think this was the one that created the most excitement in the room.
Live edit of semantic models in Direct Lake mode with Power BI Desktop
Previously users only had read-only capability when connecting to semantic models in Direct Lake mode, but now it will be possible to create new measures, add calculation groups and create relationships between tables, all in a single tool without having to jump between interfaces to develop semantic models and create Power BI reports.
Version control for semantic models using Direct Lake
When using semantic models using Direct Lake, all changes are permanent and automatically saved. Once this feature becomes available, multiple users will have the ability to collaborate more efficiently and transparently. Developers will have access to the version history and have the possibility to revert back to previous versions, similar to what is currently possible with word or excel. Furthermore, this feature enhances accountability by providing a clear history of the changes each developer applied to the model.
Tags
Tags will allow product owners to categorise and organise the data and allow users to easily search, view and filter content by the applied tag across various experiences.
Integration with Microsoft Purview
Microsoft Purview is being deeply integrated with Fabric. Among others, two exciting announcements are worth to call out. 1) The ability to use Microsoft Purview Information Protection sensitive labels to control access to the Fabric items; 2) Extended support for Microsoft Purview Data Loss Prevention to detect the upload of sensitive data and trigger policies , such as raising alerts to the security admin or show a custom policy tip to data owners so they can take some action.
Copilot
It wouldn’t be a Microsoft conference if Copilot was not in the mix. There has been a number of announcements around copilot, from the release of a copilot for Dataflows Gen 2 and a new copilot for the Data Warehouse to improvements in the Power BI copilot and the ability to now interact with visuals in the page and provide text-based answers and summaries across all pages in a report.
If you would like to learn more about Microsoft Fabric and how Telefónica Tech can help you, please get in touch.