Skip to main content
Gainsight Inc.

Use Data from Databricks in Data Designer

IMPORTANT: This feature will be available for customers from April 27, 2024 onwards.

This article helps admins create datasets by ingesting data from Databricks using Data Designer.

Overview

This article provides a comprehensive guide for administrators on how to create and manage datasets in Data Designer by using Databricks as a data source.

Business Use Case: If your organization uses Databricks to manage business data, you can integrate this data with Gainsight. This integration allows you to combine BigQuery data with other customer information in Gainsight's Data Designer. You can then use these datasets for reporting, creating rules, and Journey Orchestrator (JO) campaigns.
 

Prerequisites

The following pre-requisites are required for you to create datasets in Data Designer:

  • Establish a connection: Create a Databricks connection on the Connectors 2.0 page.
    For more information on creating a Databricks connection, refer to the Databricks Connector article.
  • Data Transfer Permissions: Ensure you have granted the create stage permission on the schema to allow Gainsight to import data sets larger than 5GB using the Databricks connector.

Create Databricks Dataset

To create Databricks Dataset:

  1. Navigate to Administration > Data Designer. The Design List page appears.
  2. Click New Design. The New Design page appears where you can see four tabs: Details, Preparation, Explore, and Configure.

Data Designer workspace in Gainsight showing a list of designs with options to create a New Design

Details

Enter the following information in the Details tab:

Note: The Details tab allows you to add basic information related to the dataset that is being created.

  1. In the Design Name, enter the name of the Data Design.
  2. In the Select Folder, select the folder in which the design needs to be saved.
  3. (Optional) In the Description, enter the details in the Description textbox as required.
  4. Click Prepare. The Preparation tab appears.

nterface with fields for inputting new Databricks data design name, folder selection, and description.

Preparation of Dataset

In the preparation stage, you can create the dataset by selecting the Data Source and it’s objects.

To prepare a Dataset:

  1. From the Data Source dropdown list, select the required Databricks data source. You can see all the objects available under the selected Databricks source.
    Note: You can see the Databricks data source here if you have established a Databricks connection on the Connectors 2.0 page as shown in the Prerequisites.
  2. Drag the required objects from the left pane to the canvas screen. The Select Fields window appears.

Workflow chart showing data sources being merged into Databricks with options for 'AllDatatype' and 'Company.

Select Fields

In the Select Fields side pane, all the fields available for the source object are displayed.

To select Fields:

  1. Select the required fields you want to add to the dataset.
  2. Click Select. The Object slide-out pane appears. This page displays the three tabs:

Selection panel for Databricks dataset fields including bigint, binary, and boolean columns.

Details and Summary Tab

The following information is available in the Details & Summary tab:

  • In the Output Dataset Name, enter the dataset name. 
  • (Optional) In the Description textbox, enter the description of the dataset as required.
  • Fields & Filters: You can view a list of all Fields (including Group By, if any) and Filters in the Dataset.

Detail overview of a 10m dataset in Databricks showing fields and filter options for the current year.

Fields Tab

This displays the following information:

  • Fields: The list of the fields selected in the above step.
  • Group By: You can select the required fields to Group By, to slice and dice the data in the dataset. Once you select a Field to Group By, all the other fields in the dataset will become aggregated.
  • Aggregation: You can select the required aggregation type from the Aggregation dropdown list.
  • Display Name: By default, the field name appears in the Display Name textbox. You can modify it as required. Display names are shown in the Datasets and reports.

To add additional fields:

  1. Click Add Fields, if additional fields are to be added.
  2. Click the Settings/Gear icon to make changes to numeric data type fields. For more information refer to the Preparation details in Data Designer article.
  3. Click the Delete/Trash icon to delete the field from the dataset.
  4. Click Save.

Selection of bigint, binary, and boolean data fields in Databricks techcomm dataset

Filters Tab

To apply Filters:

  1. Navigate to the Filters tab.
  2. Click Add Filter.
  3. From the Field dropdown list, select the Field to which filter is to be applied.
  4. Choose the Operator and then input the data in the Value textbox.
    Notes:
    • You can also Add more filters by clicking the + icon next to Value textbox.
    • You can Delete a filter by clicking the x icon.
    • You can add advance filters such as (A OR B) AND C, type in your desired expression in the Advanced Logic text box.
  5. Click Save.

Filter configuration in Data Designer showing an option to add an advanced filter for dataset '10M'

Edit Datasets

To edit a Dataset:

  1. Click the Pencil icon (or) click the three-vertical dots menu.
  2. Click Edit to modify a dataset. You will land on the following page as shown below.

Editing options for a 10m dataset in Databricks including buttons to add fields and set filters.

Edit the Output Dataset Name, Description, Fields, and Filters as required.

You can also create more Datasets from Salesforce, MDA data, Amazon S3, and Databricks, and perform Merge, Transform, and other actions on the created Datasets. Then, you can proceed to the Explore tab to analyze the data and create reports.

Additional Resources