Skip to main content
Gainsight Inc.

Databricks Connector

This article explains to admins how to create a connection from Gainsight to Databricks, create a data job, merge datasets, and transform and configure jobs or Job chains in Gainsight.

This article explains to admins how to create a connection from Gainsight to Databricks, create a data job, merge datasets, and transform and configure jobs or Job chains in Gainsight.

Overview

Databricks is a unified data analytics platform offering scalable, collaborative, and integrated solutions for data engineering, data science, machine learning, and analytics.

Integration of Gainsight with Databricks allows you to ingest data into Gainsight standard objects (for instance Company, Person, and so on) and custom objects. Get deep insights from Databricks to understand how customers use your product. Predict churn risks, and personalize outreach for better customer success.

Key Benefits:

  • Improved Data Ingestion: You can now easily bring data from Databricks tables into Gainsight. This lets you connect with multiple warehouses and schemas, and perform transformations and merges between tables in connectors for meaningful data ingestion in Gainsight.

  • Expanded Data Connectivity: With the new Databricks connector, you can utilize Data Designer to blend Databricks tables with other data sources in Gainsight. This integration opens up possibilities for richer insights, which can be leveraged in Reporting, Rules, and Journey Orchestrator (JO).

Limitations:

  • The following data types are supported:
    • Int
    • String
    • Boolean
    • Double
    • Float
    • Timestamp
    • Date
    • Bignit
    • Smallint
    • Decimal
    • Timestamp_NTZ
    • Tinyint
  • Gainsight recommends to extract only 100 fields per table from Databricks to run a job. For additional fields, create another job to ensure a successful job run.

Note:

  • For datetime fields in DataBricks, which are in NTZ format ( No Time Zone), Gainsight considers DateTime in UTC format.
  • By default, you can create five connections of Dataricks in your tenant. To create more connections, reach out to Gainsight support.

Fetch Credentials from Databricks

Gainsight provides the OAuth type of authentication and connectivity to integrate Databricks:

  • OAuth uses OAuth Client ID and OAuth Client Secret for authentication and Warehouse or Compute for connectivity.

The following table list the different credential requirements for OAuth authorizations:

OAuth
Warehouse Compute
Database Host Database Host
Warehouse ID Organization ID
N/A Cluster ID
Catalog Name Catalog Name
Schema Name Schema Name
OAuth Client ID OAuth Client ID
OAuth Client Secret OAuth Client Secret
(Optional) SSL Certificate (Optional) SSL Certificate

The required credentials can be obtained from Databricks as follows::

Database Host 

Database host is the URL of your Databricks account. It is in the following format :

Database Host: <yourDomain>.cloud.databricks.com

Note: For the database host, do not include https:// 

To obtain the Database Host:

  1. In Databricks, navigate to SQL Warehouses.
  2. Click any of the existing warehouses.
  3. Click Connection details.
  4. Copy the Server hostname.

Connection Details page in Databricks showing the server hostname, port, protocol, HTTP path, JDBC URL, and OAuth URL settings for a SQL Warehouse.

Warehouse ID

Warehouse ID is a unique identifier assigned to the SQL warehouse within the Databricks environment. When you create a SQL Warehouse in Databricks, the platform automatically assigns it a Warehouse ID.

To obtain the Warehouse ID:

  1. In Databricks, navigate to SQL Warehouses.
  2. Click any of the existing warehouses.
  3. From the Name field, copy the Warehouse ID.

Overview of SQL Warehouses in Databricks interface, displaying the 'Starter Warehouse' status, type, cluster size, and auto stop settings.

Catalog and Schema Name

Catalog name refers to the name of the catalog you are working within. In Databricks, it acts as a top-level container for metadata, including databases, tables, views, and functions.

Schema name refers to a logical grouping of database objects such as tables, views, and functions.

To obtain the catalog and schema names:

  1. In Databricks, navigate to SQL Editor.
  2. Copy the Catalog and Schema name as follows:

Databricks interface showing the SQL Editor with the catalog list, highlighting databases like hive_metastore, automationdb, and default.

In the above example, the Catalog name is Hive_metastore and the Schema name is automationdb or default.

Note: The data tables that are visible in the Connector jobs in Gainsight CS are the tables that reside inside the catalogs as schemas.

OAuth Client ID and OAuth Client Secret

For OAuth machine-to-machine (M2M) authentication, Oauth Client ID and OAuth Client Secret are required. To obtain the OAuth Client ID and OAuth Client Secret from Databricks, you need to create service principals. A service principal is an identity that you create in Databricks to use with automated tools, jobs, and applications.

For more information on how to create a service principal, refer to the OAuth Machine-to-Machine (M2M) authentication article from Databricks.

Once the service principal is created, navigate to the Workspace Settings > Identity and Access in Databricks.

Databricks settings screen displaying 'Identity and Access' options with manage buttons for users, groups, and service principals.

To obtain the OAuth Client ID and OAuth Client Secret:

  1. Click Manage next to Service principals.
  2. Click any existing principal.
  3. From the Configurations tab, copy the OAuth Client ID also known as Application ID..

Databricks settings screen showing service principal details for 'test1' with active status and entitlements like cluster creation and SQL access.

  1. Click the Secrets tab.
  2. Click Generate Secret. The Generate secret window appears.
  3. Copy the OAuth Client Secret.

Pop-up window in Databricks showing successful generation of an OAuth secret for API authentication with instructions to copy it immediately.

Organization ID

To find your organization ID in Databricks, simply log into your Databricks workspace. Once logged in, you can locate the organization ID by checking the URL in your browser’s address bar. The organization ID is typically a series of numbers at the end of the URL, after the = sign. Copy the organization ID as follows:

Databricks workspace interface displaying a list of user notebooks and queries with creation dates and ownership details.

Cluster ID

In Databricks, each cluster is assigned a unique Cluster ID.

To obtain Cluster ID:

  1. Navigate to the Compute section.
  2. Click to open any cluster. The Cluster page appears.
  3. The Cluster ID appears in the browser’s URL field as an alphanumeric series after /cluster/ <cluster id>.

Databricks compute cluster configuration page showing details like policy, access mode, runtime version, node type, and termination settings.

SSL Certificate

To secure your connection in Gainsight, download the SSL Certificate for your Databricks account.

To download the SSL Certiicate:

  1. Click View Site Information on the address bar.

Databricks compute cluster settings page showing configuration options, runtime version, node type, and auto-termination settings.

  1. Click Connection is secure.
  2. Click Certificate is valid. The Certificate viewer window appears.
  3. Click Details tab.
  4. Click Export to download the SSL certificate.

Databricks compute cluster interface showing SSL certificate details with options to export the certificate for secure connection setups.

Alternatively, you may upload your own SSL certificate that has been authenticated on Databricks.

The SSL certificate is in the .CER format.

Create a Connection

After obtaining the Databricks credentials, you can create a connection between Gainsight and Databricks.

Note: Gainsight recommends that before attempting to connect, verify that the Databricks Warehouse or Cluster you are connecting to is active and running.

To create a connection:

  1. Navigate to Administration > Connectors 2.0.
  2. Click Create Connection. The Create Connection dialog appears.
  3. From the Connector dropdown, select Databricks.
  4. In the Name of the connection field, enter a name to identify the connection.
  5. In the Authorization Type field, select the type of authentication as OAuth.
    Note: Gainsight recommends not to use Basic Auth ( username and password) based authentication as Databricks has stopped supporting this authentication mechanism.
  6. In the Connect To field, select the type of connectivity:
    • Warehouse
    • Compute
  7. Enter the credentials obtained from Databricks, based on the selected Authorization and Connect To options.
  8. Click Save.

DB Connector.jpg

Context Menu Options

Once you authorize the connection, configure your connection with one of the following three vertical dots menu options, as required:

  • Edit Connection: Update or modify the Databricks connection details.
  • Delete Connection: Delete the Databricks connection,

Note: Databricks connection can be deleted when the associated Job Chains and data jobs are deleted first.

Context menu options for a Databricks connection showing 'Edit Connection' and 'Delete Connection' actions.

Create a Job

Admins can create jobs from the Jobs page to sync data from required source objects with Gainsight. After selecting the Data Source, you can view all the Databricks objects in the left pane. Drag and drop a source object to the Preparation screen to start creating a job.

Note: To sync newly created views from Databricks to Gainsight, ensure that you re-authorize the Databricks connection from the Connections page.

For more information, refer to the Preparation of Connector Jobs article.

Data preparation workflow in Databricks, depicting data sources and transformation steps leading to the final destination

If the data jobs in a connection are dependent on each other, create a Job chain and configure the schedule to the Job Chain. For more information, refer to the Configure Job or Job Chain Schedule article.

Merge Datasets

You can merge two or more datasets and create an output dataset. For example, you can merge datasets related to Bitcoin Cryptocurrency and Company datasets to know the list of transactions for each company and create an output dataset such as Company ARR.

For more information on Merge, refer to the Preparation of Connector Jobs article.

Transform Data

In the Preparation step of a connector job, you can transform data and add Case Fields to get more meaningful insights from the customer data.

Business use case: The Transform function provides the capability to create or modify new case fields. The new case fields can be used to modify the external field as per the consumption requirement in Gainsight’s data model. Case fields can be defined to populate different values for different case conditions. For example, External picklist values such as New, Open, and Closed can be modified to Active and Inactive to match Gainsight’s picklist values.

For more information on how to use the Transform function, refer to the Preparation of Connector Jobs article.

Add Destination

Once the final output dataset is prepared, you can add a destination to the output dataset to sync data from the source to the target Gainsight object.

For more information on how to add destinations, refer to the Preparation of Connector Jobs article.

Direct Mappings

To sync data from an external system to Gainsight using Direct Mappings, you must map fields from the external system's source objects to Gainsight's target objects. The data sync happens based on the configured field mappings.

For more information on Direct Mapping, refer to the Preparation of Connector Jobs article.

Derived Mapping

(Optional) You can use Derived Mappings to populate values in an object's field (of the GSID data type) with values from the same or another standard Gainsight object. Lookup is used to accomplish this, and you can match up to six columns in the lookup.

Note: To use Derived Mappings, your Target Object must have at least one field of data type GSID.

For more information on Derived Mapping, refer to the Preparation of Connector Jobs article.

Job Chain

The Job Chain feature helps simplify the process of scheduling a group of related jobs that you need to run in a particular sequence. This sequence is based on the scheduled time of the Job Chain.

For example, if Job A and B are to be executed in a sequence with Job B to automatically start as soon as Job A is completed, and there is no need to schedule a job separately.

Note: When a job is added to a Job Chain, it follows the Job Chain's schedule, not its own, and executes in the sequence specified within the chain.

For more information on how to use Job Chains , refer to the Job Chain Page article.

Configure Job or Job Chain

After the job or job chain is configured, you can configure the Schedule for the job run.

If there are multiple Jobs in a Connection which are dependent on each other, Gainsight offers an option to create a Job Chain to sync data in sequence.

For more information, refer to the Configure Job or Job Chain Schedule article.

Job Activities

You can view the Execution and Update Activities of all the data jobs on the Activity page. You can also download the error logs of the jobs from this page to help troubleshoot the configuration issues.

For more information, refer to the Activity Page article.