Databricks Apps – Data Dump Use Case

Megan Smith

16 June 2025

Databricks Apps Data dump use case

Apps is an emerging feature within the Databricks platform (in public preview as of May 2025) that aims to provide space for visualisations, self-serving analytics and dashboards, created with popular frameworks such as Dash, Streamlit or Flask. With automatic deployment, built–in governance through Unity Catalog, and easy access to the existing data, Apps adds another great tool to the data analyst’s stack.

Use case and planning

The current blog investigates if an app can be used as a self-service data export tool. Can we create something that would be useful for data analysts with limited SQL experience who need to export data outside of Databricks and save it in a common file format, such as CSV or JSON, for further processing?

The main points addressed in the experimental app developed here were:

Functionality – the user should be able to select multiple tables from Unity Catalog and choose a suitable name and file format for the exported files
User experience – the app should have a clean, easy to navigate interface, as well as feedback indicators to help the user check if they have selected the correct options
Performance – app’s user interface should run smoothly; export table(s) size should be taken into consideration.

Pre-Requisites

Before the actual development of an app, it is necessary to make sure the correct environment is being used. Check the following list to make sure the Databricks workspace you are working in is using the right settings:

Region – Databricks Apps is only available in certain regions. I used a workspace located in West Europe.
Pricing tier – to use the Apps, some features are required, that are only available in the Premium tier workspaces, such as Databricks SQL and Unity Catalog.
Compute resources – Databricks Apps uses SQL warehouse for computing. You can either create one before developing the app or let the setup guide create one for you and then edit the parameters (name, size, etc).

Framework selection

Currently Databricks supports several frameworks for building apps. They have a lot in common as well as some unique features, so let’s quickly review them one by one:

Streamlit – quick and easy to setup, suitable for exploration, dashboards and prototyping; however, reruns entire script on every input change
Dash – provides more control over layout and call-backs, suitable for production-grade apps; however, more verbose code and steeper learning curve
Gradio – fast to build, well suited for showcasing ML models; however, limited layout control
Shiny – familiar to R users, reactive programming model; however, less popular with python developers, weaker ecosystem
Flask – flexible, full control over the app; however, takes longer to setup, lacks UI shortcuts

Given that our use case is relatively simple, we will use Streamlit as a main framework for its quick and easy development as well as nice, modern visuals. For comparison, we will replicate the same functionalities in Dash as it is well-suited for large scale production apps and allows for fine grain control over layout and call-backs.

Developing the app

During the app setup, we can choose a template for our app (Chat bot, Data app or Hello world) to help us with the initial code and serve as an interactive example.

For our case, Data app works best. Then we follow the steps – select app name, description, compute resource. Once the app is created, we can access the source files through the provided link (or through the standard Workspace menu).

Next, we add desired functionality and user interface in the app.py file. In the case of Streamlit, UI elements are created one by one as we go. Order is important, as the page is rendered from top to bottom on every input change. Despite the limited (compared to other frameworks) UI customization options, Streamlit supports Markdown – standard text formatting, colours and google icons can go a long way.

Click the Deploy button on the right to update the app and test new features.

Following the design list from section 2, the main part of the app is the Data export panel. A checkbox at the top determines if the subsequent fields will be rendered or not, saving time and computing resources. Next, we have multiselect fields for catalog, schema and table (dynamically generated after the previous step is done) where the user can select the desired items.

Once the tables are confirmed, the user receives a colour-coded summary of the selected items as well as fields to input export file type and name of the .zip to be downloaded.

As an additional functionality, there is the Table review panel where the user can select a single table and receive basic information about the number of rows and columns plus a few lines of sample data. Again, this functionality can be turned off to prevent unnecessary reruns.

For the sake of comparison, the app was reimplemented for the Dash framework. The user interface is less fancy (although it probably could be improved) and the nested layout definition makes it harder to write, but the app works faster. Below you can find a code snippet and a screenshot of the UI.

The final choice of framework gets down to developer’s preference and specifics of the problem being solved.

Accessing data in Unity Catalog

One of the main selling points for Databricks Apps is that it lives within the platform, making governance and access straightforward. As the app uses SQL warehouse computing resources, it can only run SQL queries (spark won’t work here), so the connection to Unity Catalog tables is made through the sql library from the databricks module. Once the function is prepared, we can write standard SQL queries as if we were in the SQL editor or a Databricks notebook.

Another point to mention is the default service principal, created with each new app. This SP can be used to perform action on behalf of the app (once it gets the necessary permissions). The name can be found in the Authorization tab of the app, while ID and secret can be seen through environment variables. More info in the documentation here.

Sharing the app with users

Databricks apps live in a Databricks workspace but access between the two is managed separately – a user can access the app even if they don’t have permissions for the workspace. The app supports all standard options – add user or group, grant them use or manage access – as well as an option to let everyone in the company access it.

Cost

To get an idea of how much an app would cost, we are using this Azure calculator and the following parameters:

Serverless SQL computing resource
2X-Small cluster – 4 DBU/h (smallest possible cluster)
A single instance
Used 4 hours / day, 5 days / week – total of 80 hours / month
Region – west Europe
Total cost for the month = $ 224

The calculations above are only considering the Serverless SQL used to query the data. Additionally, there is a separate compute resource that renders the app interface, which is described by the app itself as “Up to 2 vCPUs, 6 GB memory, 0.5 DBU/hour”. Even though it is not much of an increase, we should keep in mind that it is running in the background and turn it off in case we will not be using the app for a long time.

Conclusion

Databricks Apps is an interesting new feature within the platform that will surely be a good fit for certain use cases. The variety of frameworks available plus the ease of deployment and operation, make it a great tool for lightweight tasks and visualizations. However, a further analysis must be performed to evaluate whether it is worth the money in each separate occasion.