Skip to main content
Gainsight Inc.

Databricks Connector

This article explains to admins how to create a connection from Gainsight to Databricks, create a data job, merge datasets, and transform and configure jobs or Job chains in Gainsight.

This article explains to admins how to create a connection from Gainsight to Databricks, create a data job, merge datasets, and transform and configure jobs or Job chains in Gainsight.

Overview

Databricks is a unified data analytics platform offering scalable, collaborative, and integrated solutions for data engineering, data science, machine learning, and analytics.

Integration of Gainsight with Databricks allows you to ingest data into Gainsight standard objects (for instance Company, Person, and so on) and custom objects. Get deep insights from Databricks to understand how customers use your product. Predict churn risks, and personalize outreach for better customer success.

Key Benefits:

  • Improved Data Ingestion: You can now easily bring data from Databricks tables into Gainsight. This lets you connect with multiple warehouses and schemas, and perform transformations and merges between tables in connectors for meaningful data ingestion in Gainsight.

  • Expanded Data Connectivity: With the new Databricks connector, you can utilize Data Designer to blend Databricks tables with other data sources in Gainsight. This integration opens up possibilities for richer insights, which can be leveraged in Reporting, Rules, and Journey Orchestrator (JO).

Limitations:

  • Data ingestion is possible only uni-directionally from Databricks into Gainsight.
  • The following data types are not supported:
    • Maps
    • Structure
    • Array
    • Void

Note:

  • For datetime fields in DataBricks, which are in NTZ format ( No Time Zone), Gainsight considers DateTime in UTC format.
  • By default, you can create five connections of Dataricks in your tenant. To create more connections, reach out to Gainsight support.

Fetch Credentials from Databricks

To create a connection in Gainsight, the following credentials are required from Databricks:

Database Host 

Database host is the URL of your Databricks account. It is in the following format :

Database Host: <yourDomain>.cloud.databricks.com

Note: For the database host, do not include https:// 

DB Host URL.jpg

Warehouse ID

Warehouse ID is a unique identifier assigned to the SQL warehouse within the Databricks environment. When you create a SQL Warehouse in Databricks, the platform automatically assigns it a Warehouse ID.

To obtain the Warehouse ID:

  1. In Databricks, navigate to SQL Warehouses.
  2. Click any of the existing warehouses.
  3. From the Name field, copy the Warehouse ID.

Warehouse ID.jpg

Catalog and Schema Name

Catalog name refers to the name of the catalog you are working within. In Databricks, it acts as a top-level container for metadata, including databases, tables, views, and functions.

Schema name refers to a logical grouping of database objects such as tables, views, and functions.

To obtain the catalog and schema names:

  1. In Databricks, navigate to SQL Editor.
  2. Copy the Catalog and Schema name as follows:

Catalog and Schema.jpg

In the above example, the Catalog name is Hive_metastore and the Schema name is automationdb or default.

Note: The data tables that are visible in the Connector jobs in Gainsight CS are the tables that reside inside the schemas.

Create a Connection

After obtaining the Databricks credentials, you can create a connection between Gainsight and Databricks.

To create a connection:

  1. Navigate to Administration > Connectors 2.0.
  2. Click Create Connection. The Create Connection dialog appears.
  3. From the Connector dropdown, select Databricks.
  4. Enter the following details:
    • In the Name of the connection field, enter a name to identify the connection.
    • In the Database Host URL field, enter the url of the Databricks account.
    • In the Warehouse ID field, enter the warehousen Id assigned to the SQL warehouse within the Databricks.
    • In the Catalog Name field, enter the catalog name working within the Databricks.
    • In the Schema Name field, enter the schema name that contains the database objects.
    • In the Username field, enter the username used for logging into Databricks.
    • In the Password field, enter the password used for logging into Databricks.
  5. Click Save.

Databricks.jpg

Context Menu Options

Once you authorize the connection, configure your connection with one of the following three vertical dots menu options, as required:

  • Edit Connection: Update or modify the Databricks connection details.
  • Delete Connection: Delete the Databricks connection,

Note: Databricks connection can be deleted when the associated Job Chains and data jobs are deleted first.

Context Menu.jpg

Create a Job

Admins can create jobs from the Jobs page to sync data from required source objects with Gainsight. After selecting the Data Source, you can view all the Databricks objects in the left pane. Drag and drop a source object to the Preparation screen to start creating a job.

Note: To sync newly created views from Databricks to Gainsight, ensure that you re-authorize the Databricks connection from the Connections page.

For more information, refer to the Preparation of Connector Jobs article.

DB Preparation.jpg

If the data jobs in a connection are dependent on each other, create a Job chain and configure the schedule to the Job Chain. For more information, refer to the Configure Job or Job Chain Schedule article.

Merge Datasets

You can merge two or more datasets and create an output dataset. For example, you can merge datasets related to Bitcoin Cryptocurrency and Company datasets to know the list of transactions for each company and create an output dataset such as Company ARR.

For more information on Merge, refer to the Preparation of Connector Jobs article.

Transform Data

In the Preparation step of a connector job, you can transform data and add Case Fields to get more meaningful insights from the customer data.

Business use case: The Transform function provides the capability to create or modify new case fields. The new case fields can be used to modify the external field as per the consumption requirement in Gainsight’s data model. Case fields can be defined to populate different values for different case conditions. For example, External picklist values such as New, Open, and Closed can be modified to Active and Inactive to match Gainsight’s picklist values.

For more information on how to use the Transform function, refer to the Preparation of Connector Jobs article.

Add Destination

Once the final output dataset is prepared, you can add a destination to the output dataset to sync data from the source to the target Gainsight object.

For more information on how to add destinations, refer to the Preparation of Connector Jobs article.

Direct Mappings

To sync data from an external system to Gainsight using Direct Mappings, you must map fields from the external system's source objects to Gainsight's target objects. The data sync happens based on the configured field mappings.

For more information on Direct Mapping, refer to the Preparation of Connector Jobs article.

Derived Mapping

(Optional) You can use Derived Mappings to populate values in an object's field (of the GSID data type) with values from the same or another standard Gainsight object. Lookup is used to accomplish this, and you can match up to six columns in the lookup.

Note: To use Derived Mappings, your Target Object must have at least one field of data type GSID.

For more information on Derived Mapping, refer to the Preparation of Connector Jobs article.

Job Chain

The Job Chain feature helps simplify the process of scheduling a group of related jobs that you need to run in a particular sequence. This sequence is based on the scheduled time of the Job Chain.

For example, if Job A and B are to be executed in a sequence with Job B to automatically start as soon as Job A is completed, and there is no need to schedule a job separately.

Note: When a job is added to a Job Chain, it follows the Job Chain's schedule, not its own, and executes in the sequence specified within the chain.

For more information on how to use Job Chains , refer to the Job Chain Page article.

Configure Job or Job Chain

After the job or job chain is configured, you can configure the Schedule for the job run.

If there are multiple Jobs in a Connection which are dependent on each other, Gainsight offers an option to create a Job Chain to sync data in sequence.

For more information, refer to the Configure Job or Job Chain Schedule article.

Job Activities

You can view the Execution and Update Activities of all the data jobs on the Activity page. You can also download the error logs of the jobs from this page to help troubleshoot the configuration issues.

For more information, refer to the Activity Page article.