Since all roads lead to Excel, this blog is inevitable. At the time of writing, there are mainly three ways to let business Excel users consume the data in Databricks.
1. Connecting through ODBC Driver
If you google “connect Excel to Databricks”, this is the documentation you will find. It requires your end users to install ODBC driver on their local computer and then uses Excel’s ODBC connector to consume the data in Databricks. In case your end users are up for setting up ODBC drivers, here is a tutorial that goes through it.
- The official solution to connect to Excel at the time of writing.
- Access directly from Excel.
- Requires individual setup for ODBC driver.
- Unsuitable for non-technical end users.
2. Connecting through Delta Share Excel Plug-in
Roughly a year ago, a Delta Share plug-in for Excel was released. This is not native in Excel, you will need to install the plug-in from here. The plug-in also has a freemium model. Basic features are free (importing up to 1000 rows), but premium features require individual or enterprise license. Since most users will likely need more than 1000 rows of data. Licensing cost needs to be taken into consideration. The general flow of this connectivity follows below steps.
- Create a share
- Add assets to the share
- Add a recipient
- Grant recipient access to share
- Provide recipient with an activation URL
For end users:
- Download the credential file using the activation URL
- Download plug-in
- Connect to the share using the credential file in Excel
- Import data in Excel
- Access directly from Excel
- 3rd party
- (Most likely) licensing cost
- Requires individual setup for Excel plug-in
3. Connecting through Power BI Dataset
The third solution is the connecting Excel to Databricks via Power BI dataset. This solution leverages the tight integration between Excel and Power BI. Power BI can connect to Databricks using either Databricks connector or Delta Share connector, and Excel users can consume the dataset published to Power BI using either “Analyze in Excel” feature in Power BI or the “Get Data from Power BI dataset” feature in Excel. Simon from Advancing Analytics did an awesome video on this connection type. Please note that in this video Simon is using the Delta Share connector, but using Databricks connector to access Databricks SQL warehouse is the more common (and recommended) way to integrate Power BI and Databricks.
- No setup needed for end users
- Additional measures/calculations can be added to Power BI dataset
- Datasets can be certified in Power BI
- Requires Power BI
- Additional setup step to publish dataset to Power BI
+ Connecting through other Commercial Solutions
Above three solutions are not the only ways to connect Excel to Databricks. Some commercial solutions like AtScale and Alteryx also offer Excel-Databricks connectivity, but those typically means additional layer in between Excel and Power BI. Unless you are already using these commercial solutions, consider the three solutions described in this blog first.