Access Control and Networking Security with Power BI and Databricks
Introduction
In today’s data-driven world, organizations are increasingly relying on business intelligence tools like Power BI to make informed decisions. However, as data becomes more accessible, the risk of unauthorized access and data breaches also increases. Therefore, it’s essential to ensure that data is protected and secure.
This blog post outlines how security features in Power BI and Databricks can be used to work together to ensure secure data access to Power BI on top of Databricks. We’ll discuss the importance of authentication, access control patterns, and networking security features to ensure that your data remains secure.
Authentication Methods
Before discussing access control with Power BI with Databricks, we need to understand authentication. Databricks needs to know who’s accessing the data and then determine what data they have access to. There are three different methods for Power BI to authenticate to Databricks: username and password, Personal Access Token and OAuth.
It’s worth noting that these authentication methods are named differently in Power BI Desktop versus Power BI Service.
Authentication Methods | Power BI Desktop | Power BI Service |
Username/password | Username/password | Basic |
Personal Access Token | Personal Access Token | Key |
OAuth | Azure Active Directory (Azure Databricks Connector)OIDC (Databricks Connector) | OAuth2 |
Note: If you are using Azure Databricks connector, the OAuth method in Power BI Desktop is called Azure Active Directory (AAD). This is because AAD is the default of OAuth mechanism that Azure Databricks uses. If you’re using the Databricks connector, the OAuth method will be called OIDC in Power BI Desktop.
When considering the choice of authentication method, it is worth noting that “username and password” does not support MFA or 2FA. If your organization requires MFA/2FA for authentication, you cannot use “username and password” to authenticate.
When it comes to Personal Access Token, it can be used for all clouds and it is also the only authentication method that supports Azure service principals (Azure service principals can only be used for Azure, and are not applicable for Databricks on AWS or GCP). In the case of Personal Access Token, the authentication is performed by Databricks and not the identity provider. As per Databricks security best practice, you should set an expiration date for your Personal Access Token, as it is not safe for you to have a key that does not expire. If the key ends up in the wrong hands, your security access will be compromised. When you have an expiration date for your Personal Access Token, you will need to do key rotation. It’s best practice to bake that key rotation as part of your CICD process for Power BI.
The last and probably most frequently used authentication method is OAuth. This is worth noting that if you’re using Databricks on AWS and you’re using Azure Active Directory as OAuth method, Databricks recently rolled out AAD passthrough for Databricks on AWS, which means you can utilize Unity Catalog’s access control in Power BI, please see “access control patterns” later in this blog. In order to use AAD passthrough, you will need to set up account level SSO, please see self-enrolment for private preview.
Authentication in the Authoring Process
The authentication setup is typically different in the following common scenarios:
- Authoring a semantic model in Power BI Desktop
- Refreshing semantic models in Power BI Service
- Querying data in Power BI semantic models using Direct Query mode.
Authentication in Power BI Desktop
If you’re a Power BI developer and you’re accessing Databricks using Power BI Desktop, you will authenticate using your personal accounts using one of the three authentication methods described in the previous section. The authentication request will go to the SQL warehouse in Databricks that Power BI is connected to, and the SQL Warehouse will verify data access privileges via Unity Catalog, and then return the query results accordingly.
Authentication in Power BI Service
After finishing your semantic model, you will publish the semantic model to Power BI Service. Once the semantic model is published to Power BI Service, you then edit the data source credentials using one of the three authentication methods. This credential typically is not your personal account, instead it’s a service account that you want Power BI to use to do the data refreshes or you can use AAD passthrough so the end user’s identity is passed down to Unity Catalog, which will be discussed later in this blog. A service account is a user principal that an organization sets up that does not belong to an actual user, eg. powerbi_refresh@yourcompany.com. The advantage of using a service account is that organizations can avoid semantic models stopping working due to certain users leaving the organization. A service principal is different from a service account. A service principal is set up through app registration and belongs to a service instead of a user, and a service principal can only authenticate to Databricks using a Personal Access Token because other authentication methods require interactive login which is not supported for service principals.
Access Control Patterns from Power BI Service
In this section, we will discuss access control patterns once the semantic model is published to Power BI Service.
Import/Composite on Service Account
If any table in a semantic model is set to Import or Dual mode, you will need Power BI to do the refreshes. These refreshes should use a service account to authenticate to Databricks. When refresh happens, SQL Warehouse will verify service account’s data access via Unity Catalog and return the query results to Power BI. In this scenario, if you want to do access control for your end users in Power BI, row-level security needs to be set up in Power BI. This is because Databricks cannot see which end user is accessing the semantic model in Power BI, it will verify what access the service account has.
Direct Query on Service Account
If you’re using Direct Query mode, you can still use service accounts for Power BI to connect to a SQL warehouse. The SQL warehouse checks Unity Catalog for the service account’s access to tables, rows, and columns and returns query results accordingly. In this scenario, the row level security also needs to be set up in Power BI in order to do access control for special end users/groups.
Direct Query on AAD Passthrough
If you do not want to set up a dual security model, and instead use Unity Catalog to govern everything without having to set up access control in Power BI, you can do AAD passthrough. This is the preferred architecture if you want to simply your access control setup. You will need to set up a direct query semantic model. There is a checkbox in “edit data source credential” that says “report viewers can only access data source with their own Power BI identities using DirectQuery”. If you tick that checkbox, Power BI will pass down the AAD identity to Databricks SQL Warehouse, SQL Warehouse will check Unity Catalog whether that end user’s AAD identity has access to certain tables, certain views, and certain columns.
In reality, you will use a combination of these scenarios. You are unlikely to have 100% of your semantic models in Import or Direct Query, consequently, you are likely going to set up security or access control in both Power BI and Unity Catalog.
If you are using Unity Catalog to govern Power BI’s end users data access in Databricks, you can govern the access to tables, to rows (row filter in Unity Catalog is equivalent to traditional row level security), to mask columns (for sensitive data), and to dynamic views. This governance is only available if you’re using direct query with AAD passthrough.
Networking Security
For secure networking access to a Databricks workspace, you can set up Front-end Private Link or IP access list. This means access to a Databricks workspace, ODBC/JDBC connections, REST API calls and Power BI connections all need to originate from a private network or certain IP ranges. Therefore you will need a Power BI Gateway when you have either Front-end Private Link or IP access list enabled. If you have a back-end only Private Link, you don’t need a gateway.
You have two choices for the gateway. One is a VNet gateway, which is a feature that recently went to GA. This is a Microsoft managed solution. You don’t have to worry about maintenance. However, it’s a premium only feature.
Another option is On-premises data gateway (despite the name, your serving layer does not need to be on-premise), which is a traditional way of setting up a data gateway hosted on a VM. You do need to maintain that VM yourself. The advantage is that it’s fully controlled by you, so you can set up scaling and load balancing if you need to, please see here for the considerations and documentation for On-premises gateway.
Please note that both Vnet Gateway and On-premises gateway support SSO on Databricks.
Summary
In conclusion, securely connecting Power BI to Databricks can be achieved using a combination of Power BI and Databricks features. By implementing proper authentication, access control patterns, and networking security features, you can ensure that your data remains secure and protected.
Discover more from Data Leaps
Subscribe to get the latest posts sent to your email.