Databricks projects can use direct connect to reverse engineer entire schemas in seconds. Please refer to the instructions below to configure and perform a direct connection to your Databricks account.
Direct Connect
Using Unity Catalog API
To establish a direct connection to the Unity Catalog in Databricks using the API, you will require the following details:
Server (ex. xxx-myAccount-xxxx.cloud.databricks.com)
Personal access token (ex. asdf1234fd12s3412d3412dfa341234dsa)
Server
The Server information can be sourced from the cluster and is also retrievable from SQL Warehouses, as stated below.
Generate a Personal Access Token
Instead of a user and pass, you will need to generate a personal access token (PAT) to authenticate when connecting from SqlDBM. As the name suggests, a PAT should be unique to each user and must be kept safe and guarded with the rigor of login credentials.
To generate a PAT, follow these instructions:
1. Select "Settings" from the top-right user menu
2. Under the User section in Settings, click "Developer". Then click the "Manage" button next to "Access tokens".
3. Press the button to "Generate new token"
4. Assign the following token properties and click "Generate"
Comment - something meaningful
Lifetime - days to keep the token alive before expiry.
Scope - "unity-catalog" is the minimum level of access that SqlDBM requires.
Write down your generated token in a secure location. You will not be able to see it again.
Direct Connect from SqlDBM Using Unity Catalog API
Select Unity Catalog API
Introduce the parameters
Select a Catalog and Schema(s)
Using Engineering Cluster
To direct connect using the engineering cluster for databricks projects, you will need the following details:
Server (ex. xxx-myAccount-xxxx.cloud.databricks.com)
HTTP path (ex. sql/protocolv1/o/12345/123-xyz)
Personal access token (ex. asdf1234fd12s3412d3412dfa341234dsa)
To obtain this information, log in to your databricks account and follow the steps below.
Server and HTTP path
To obtain the complete schema DDL, SqlDBM must loop through all schema objects to run the "SHOW CREATE TABLE" command. For this, we require an Engineering Cluster, running in "Single user" mode. Please ensure that the following parameters are set for the cluster you plan to use:
To obtain the Server and HTTP path for your databricks cluster, follow the steps below once logged in:
1. Select the "Data Science & Engineering" workspace
2. Select "Compute" to see existing clusters
3. Start the cluster you will connect to
4. Click on the started cluster to open its properties
In the cluster options, navigate to the JDBC/ODBC details in the advanced properties like so:
5. In the "Configuration" tab, scroll down and expand the "Advanced options"
6. Select the "JDBC/ODBC" tab
note the Server Hostname and HTTP Path details
Direct Connect from SqlDBM Using Engineering Cluster
Select the Engineering Cluster connection method.
With the three pieces of information in hand (Server, HTTP path, and Token), go to the Reverse Engineering screen in SqlDBM and, proceed to log in, select the database, and schemas you wish to retrieve. For complete details of the RE process, use the links to the associated articles at the end of this page.
Note that the direct connection and its parameters will only persist in the browser session. These credentials are not stored or recorded by SqlDBM.
Google Cloud Platform (GCP) Databricks Accounts
If Unity Catalog is not enabled, Databricks requires a "Google Service Account" to be configured at the cluster level for accessing entities using "SHOW CREATE TABLE" hosted on Google Cloud.
The Google Service Account requires appropriate permissions within the Databricks project. Please refer to the following screenshot for an example.
Manual DDL generation
Reading the DDL from Databricks Files
Databricks supports using external metastores instead of the default Hive metastore.
You can export all table metadata from Hive to the external metastore.
Use the Apache Spark Catalog API to list the tables in the databases contained in the metastore.
Use the SHOW CREATE TABLE statement to generate the DDLs and store them in a file.
Use the file to import the table DDLs into the external metastore.
The following code accomplishes the first two steps.
%python
dbs = spark.catalog.listDatabases()
for db in dbs:
f = open("your_file_name_{}.ddl".format(db.name), "w")
tables = spark.catalog.listTables(db.name)
for t in tables:
DDL = spark.sql("SHOW CREATE TABLE {}.{}".format(db.name, t.name))
f.write(DDL.first()[0])
f.write("\n")
f.close()
Reading the databricks file:
1. List all the temporary files.
dbutils.fs.ls ("/tmp/")2. Copy the file path and paste it to
df = spark.read.text("")
df.display()For more information about the databricks File System (DBFS), please visit https://docs.databricks.com/dbfs/
Reading the DDL from Databricks User Interface
The process would include the following:
Navigate to the Compute to run the cluster - Number 1 on the screenshot.
Add a Notebook to run the below Python code - Number 2 on the screenshot.
# Set your catalog name
catalog = "catalog_name"
# There should be a comma-separated list of schemas or a single schema name
schemas = "schema1, schema2".split(",")
spark.catalog.setCurrentCatalog(catalog)
# Create an empty list to store the table metadata
table_metadata = []
# Iterate through schemas and tables
for schema in schemas:
allTables = spark.catalog.listTables(schema)
for t in allTables:
# Skip temporary tables
if not t.isTemporary:
try:
ddl = spark.sql("SHOW CREATE TABLE {}.{};".format(schema, t.name))
create_table_statement = ddl.first()[0]
# Append the table metadata to the list
table_metadata.append([schema, t.name, t.tableType, create_table_statement])
except Exception as error:
# Handle errors if necessary
pass
# Convert the table metadata list to a DataFrame
metadata_df = spark.createDataFrame(table_metadata, ["Schema", "Table Name", "Table Type", "Create Table Statement"])
# Display the DataFrame in the Databricks UI
display(metadata_df)3. From Databricks UI, download the results in csv format and copy the “Create Table Statement” column.
4. Manually paste the DDL into SqlDBM screen.
Upload and import the script as per SqlDBM reverse engineer instructions (see related article at the end of this page).
Account Settings and Privileges
This section describes the required privileges and settings that you will require for connecting and generating DDL from Databricks.
Required Entitlement and Privileges
Grant access to Databricks SQL to users or service principal users who can access the SQL environment.
Databricks Sql Access Description:
Entitlement name (UI) |
Entitlement name (API) |
Default |
Description |
Databricks SQL access |
databricks-sql-access |
Granted by default. |
When granted to a user or service principal, they can access Databricks SQL. |
The users group is granted the Workspace access and Databricks SQL access entitlements by default. All workspace users and service principals are members of the users group. To assign these entitlements on a user-by-user basis, a workspace admin must remove the entitlement from the users group and assign it individually to users, service principals, and groups.
Source: https://docs.databricks.com/en/administration-guide/users-groups/users.html
Securable objects to be reverse-engineered in SQLDBM within the Hive Metastore and Unity Catalog
The securable objects are:
-
CATALOG: controls access to the entire data catalog.
-
SCHEMA: controls access to a schema.
TABLE: controls access to a managed or external table.
VIEW: controls access to SQL views.
FUNCTION: controls access to a named function.
-
Required Privileges on Hive metastore
The following object privileges will be required for obtaining Databricks DDL.
SELECT: gives read access to an object.
USAGE: does not give any abilities, but is an additional requirement to perform any action on a schema object.
To grant a privilege to all users in your workspace, grant the privilege to the users group. For example:
GRANT SELECT ON TABLE . TO usersSource: https://docs.databricks.com/en/data-governance/table-acls/object-privileges.html
Required Privileges on Unity Catalog
The following object privileges will be required for obtaining Databricks DDL.
SELECT
Query a table or view, invoke a user-defined or anonymous function, or select ANY FILE. The user needs SELECT on the table, view, or function, as well as USE CATALOG on the object’s catalog and USE SCHEMA on the object’s schema.USE CATALOG
Required, but not sufficient to reference any objects in a catalog. The principal also needs to have privileges on the individual securable objects.USE SCHEMA
Required, but not sufficient to reference any objects in a schema. The principal also needs to have privileges on the individual securable objects.
For example, SELECT privilege to a user in a workspace using the SQL command:
GRANT SELECT ON TABLE t TO `user`;Source: https://docs.databricks.com/en/sql/language-manual/sql-ref-privileges.html
IP allowlisting and private networking
Corporate networking and security policies may interfere with connectivity between SqlDBM and Databricks. If you are unable to connect despite following the guidelines in this article, please make sure that IP blocking or private networking is not interfering.
An easy way to diagnose this type of issue is by attempting to connect to your Databricks account from a home computer, without using a VNP or any corporate software.
For instructions on allowing SqlDBM through the firewall or private network, please see the related article at the bottom of this page.
SSO for Direct Connection
Single sign-on (SSO) lets your team connect SqlDBM to your data platform using your existing identity, without sharing personal access tokens.
SqlDBM provides two independent SSO logins:
- Authentication to the SqlDBM account for project access — see the Enabling Single Sign-On (SSO) for SqlDBM guide.
- Authentication to the database / cloud data platform for direct connect — covered below for Databricks.
How Databricks SSO is different
Unlike Snowflake or BigQuery, Databricks is its own OAuth authorization server (User-to-Machine / U2M OAuth). There is no external identity provider (Azure AD, Okta, etc.) to wire up — instead, you register a custom OAuth application directly in Databricks, and SqlDBM connects to it. SqlDBM uses a public client with PKCE, so no client secret is required.
Step 1 — Register a custom OAuth app in Databricks
In your Databricks account console, register a custom OAuth application (or enable partner OAuth), then:
-
Set the application's redirect URL to SqlDBM's callback (this must match exactly):
https://app.sqldbm.com/login/sso/oauth
- Note the application's Client ID — you'll enter it in SqlDBM in the next step. No client secret is needed.
Databricks references:
- OAuth U2M overview — https://docs.databricks.com/aws/en/dev-tools/auth/oauth-u2m
- Enable custom / partner OAuth apps — https://docs.databricks.com/aws/en/integrations/enable-disable-oauth
Step 2 — Configure SSO in SqlDBM
SSO configuration in SqlDBM must be performed by an account admin or alternate admin. Once configured, all users will have the SSO login method available in Direct Connect.
As an admin, open the account menu, click SSO settings, select Databricks, and provide:
| Field | Value |
|---|---|
| Client id | The Client ID of the custom OAuth app from Step 1. |
| Server URL | Your Databricks workspace host, e.g. dbc-xxxx.cloud.databricks.com. |
Click Save. That's all that's required — SqlDBM derives the Databricks OAuth endpoints and scope automatically from the Server URL, so there is nothing else to enter.
Server URL by cloud: AWS
dbc-xxxx.cloud.databricks.com· Azureadb-xxxx.azuredatabricks.net· GCPxxxx.gcp.databricks.com.
Step 3 — Using SSO in Direct Connect
Once SSO details have been saved, users can choose Login with SSO (alongside token and other methods) in the Direct Connect options.
- In the Tool, start a Databricks direct connection (Reverse Engineer).
- Choose Login with SSO, provide your Server URL, and connect.
- Authorize in the Databricks popup that appears.
- Proceed with catalog / schema selection and the rest of the reverse engineering process.
Notes & troubleshooting
-
Redirect URL must match exactly. If the redirect URL registered in your Databricks OAuth app does not exactly match
https://app.sqldbm.com/login/sso/oauth, the authorization popup will fail. This is the most common setup error. - Token lifetime. Databricks access tokens last about an hour; you will be re-prompted to sign in when one expires.
- PKCE is automatic. SqlDBM handles the Authorization Code + PKCE (S256) flow for you; no client secret is used.