Skip to main content
Gainsight Inc.

S3 Dataset Task in Rules Engine

Gainsight Standard Edition
This article supports Gainsight Standard Edition. This Edition is built on Gainsight's state of the art Matrix Data Architecture (MDA) platform, and is designed for customer success professionals for driving revenue, increasing retention, and scaling operations. To learn more about Gainsight Standard Edition, click here.

If you are using Gainsight Salesforce Edition, which is built on Salesforce and customer business data is stored in SFDC, you can find supporting documentation here.

 

This article provides guidelines on how Admins can create a dataset by fetching data from a csv/tsv file in any bucket configured in Connectors 2.0 or Gainsight management. These csv/tsv files can be uploaded manually into the S3 bucket or exported into the S3 bucket from another Rules Engine task. This task allows Admins to compose a dataset task from a csv/tsv. Once the dataset is created, Admins can apply other tasks and set up actions on it.
For more information, refer Export to S3 from Rules Engine.
Following are the use cases where you can use this task:

  • You can fetch data from an historical csv/tsv file in the Gainsight managed or any S3 bucket if you know the exact file name. You can then put this data in the Gainsight bucket.
  • While exporting output csv/tsv files into the S3 bucket from another Rules Engine task, Admins could have configured to export a set of smaller files instead of a big file (Ex: larger than 10 GB).

For more details on the task configurations, refer the section S3 File Configuration.

Prerequisites

  1. To create an S3 dataset from a csv/tsv file loaded into an S3 bucket, create a connection in the Connectors 2.0 page as shown below:
  1.  Navigate to Administration > Operations > Connectors 2.0 > Connectors tab.
  2.  Click Create Connection on the Amazon Web services S3 Connector. Create a Connection dialog appears.
  3.  Enter the following:
  • Connection Name: Example, My S3 Custom Bucket
  • Credentials of your S3 bucket like Bucket Name, Access Key, and Security Token
    Make sure that you enter valid credentials to establish an S3 connection.
  1.  Click TEST CONNECTION to validate the S3 connection. If the credentials for S3 bucket are correct and valid, it displays the success message, Connection Successful. If not, it displays an error message, Connection failed.
  2. Click CREATE. A connection for your S3 bucket is established and the same appears in the S3 File Configuration section in the Setup Rule page for S3 Dataset.

Create a Connection.gif

  1. Before an S3 dataset is created, make sure that the source csv/tsv file is available in the Gainsight Managed S3 bucket.
  2. Ensure that Date and DateTime values in the csv/tsv file are in one of the formats listed in the Date Configuration section.

Creating an S3 Dataset Task

To create a S3 Dataset Task:

  1. Navigate to Administration > Rules Engine. Click + RULE. Create Rule page appears.
  2. Provide the following in the Create Rule page:
    a. Enter Rule Name.
    b. Enter Description [Optional].
  3. Click NEXT. Setup Rule page appears.
  4. Click +TASK > S3 Dataset. S3 Dataset configuration page appears. This page has three sections, S3 File Configuration, Columns, and Date Configuration.

Creating an S3 Dataset Task.gif

  1. Enter the Task Name, Task Description, and Output Dataset Name in the Setup Rule header.

Note: Output Dataset Name auto populates from Task Name and it can be changed to a different name. It has no dependency on the Task Name.

In this use case, the following details are used:

  • Task Name: Extract from S3 [Maximum 80 characters and should be Alphanumeric; _ and space are supported]
  • Task Description: Extract data from a file in the S3 bucket [Maximum 200 characters]
  • Output Dataset Name: Extract from S3 [Maximum 60 characters and should be Alphanumeric; _ and space are supported]. This gets pre-populated with the task name by default.

S3 File Configuration.png

The S3 Dataset page has three sections, S3 File Configuration, Columns, and Date Configuration as shown below. Admins can configure this page to setup an S3 extract job. Depending on this configuration, columns in the csv/tsv file are extracted to fields in the S3 dataset.

S3 File Configuration

Perform the following configurations in this section:

  1. Select correct S3 bucket:

  • Gainsight Managed, if you want to fetch a csv/tsv file from the Gainsight Managed bucket
  • Your S3 custom bucket, if you want to fetch a csv/tsv file from your S3 custom bucket

Note: You can see an S3 bucket here for which you have established an S3 connection in the Connectors 2.0 page as shown in the Prerequisites.

  1. In the File Path field, you have the following options for the file name:
  1. Equals: If you know the exact file name in the S3 bucket with csv/tsv extension (Example: CompanyDetails.csv or CompanyDetails-2018-02-20.csv), use the option Equals to select that specific file for data loading into the S3 dataset. You can use this option to load an historical file from the S3 bucket. If your csv/tsv file has .gzip or .bzip as extension, select the specific Compression Type in the file properties. Select the other file properties as required for smooth data loading. However, Gainsight supports the following file properties:
  • Field Separator: , (Comma) as separator for the csv files and (Tab) for the tsv files
  • Text Qualifier: “ (Double Quote) or ‘ (Single Quote)
  • Escape Character: \ (Backslash), ‘ (Single Quote), or “ (Double Quote)
  • Compression Type: gzip or bzip

File Path.png

  1. Starts with: You can use this option only when you select the Use Date Pattern checkbox. This option can be used in combination with Date Pattern in the file name. If you enter the partial file name with date pattern, the specific file is selected for creating S3 dataset. You should select the file properties like Field Separator and Compression Type to select the correct file from the S3 bucket. You can use this option if you want to ingest data from multiple files of similar file names into the S3 Dataset. While exporting output csv/tsv files into the S3 bucket from another Rule task, if the file size is big (Ex: larger than 10 GB), Admins can configure to divide this big file into a set of smaller files with similar file names (Ex: file 1 of 1.5 GB, file 2 of 2 GB, etc.) and export into the S3 bucket. These smaller files have the same columns and the records are divided into multiple files. While creating an S3 Dataset using the option Starts with, records from all of the smaller files are fetched into the S3 dataset.
  1. Use Date Pattern: Enable this to use Date Pattern in the file path. If you select this, ${pattern} appears in the file name and it can be placed anywhere in the file path. Enter the number of days to subtract from the rule date in the ${pattern} formula. Date pattern in the file name can be applied with both the Equals and Starts with options in the File Path field.

Note: Position of the Date Pattern in the file name should correspond with the name of the files exported to the S3 bucket.

  1. You can use the following combination of File Path and Date Pattern options for the mentioned use cases:

File Path: In all the below conditions, make sure that you enter full file path from S3 bucket. For example, if a file named "CompanyDetails.csv" was placed in the folder MDA-Data-Ingest/input/ of a given S3 Bucket, and if you use the option Equals, the file path would be MDA-Data-Ingest/input/CompanyDetails.csv.

  1. Equals: You can use this option only if you know the exact file name (with/without file generated date) in the S3 bucket with csv/tsv extension (Example: CompanyDetails.csv or CompanyDetails-2018-02-20.csv). The file path would be MDA-Data-Ingest/input/CompanyDetails.csv.
  2. Equals + Date Pattern: You can use this option if you know the file name without the file generated date. You can add Date Pattern to select a specific file for data ingestion. For example, if the Rule execution date is 2018-02-24 and you want to ingest data from a file exported to S3 bucket on 2018-02-20. You can subtract 4 days from Rule Date. The file name will be CompanyDetails-${pattern}.csv where ${pattern} = Subtract 4 days from Rule Date with format yyyy-MM-dd and the file path will be MDA-Data-Ingest/input/CompanyDetails-${pattern}.csv.
  3. Starts with + Date Pattern: You can use this option if you know the file name partially without file generated date. You can add Date Pattern to select a file or a set of files that match the file path for data ingestion. For example, if the Rule execution date is 2018-02-24 and if you want to ingest data from a file or multiple files that match the file path exported to S3 bucket on 2018-02-20. You can subtract 4 days from the Rule Date. The file name will be CompanyDetails-${pattern} where ${pattern} = Subtract 4 days from Rule Date with format yyyy-MM-dd and the file path will be MDA-Data-Ingest/input/CompanyDetails-${pattern}.

Configuring S3 Dataset.gif

  1. When you select the option Equals, make sure to add the extensions .csv/.tsv and .bzip/.gz for the file format (Comma separated/Tab separated) and compression type respectively.
  2. Is the source file Encrypted: If your csv file in the S3 bucket is encrypted, select the checkbox and correct PGP key to apply decryption on the csv/tsv file.

Note: To configure a PGP key for decryption with Gainsight, contact support@gainsight.com so that you can use the same here.

  1. When you select the option Equals or Starts with, make sure to select the following File Properties as required:
  • Field Separator: Comma / Tab for the .csv/.tsv files respectively
  • Compression Type: bzip / gzip for the file compression types respectively. Select None if the compression is not applied on the source file in the S3 bucket.

If you do not apply the above configurations correctly or you have provided incorrect S3 bucket details while creating an S3 connection in the Connectors 2.0 page, the Rule execution fails.

Notes:

  • Field Separator: Use , (Comma) as separator for the csv files and (Tab) for the tsv files.
  • Text Qualifier: It is used to import a value (along with special characters) specified in the Quotation while importing data. It is recommended to use the same Text Qualifier in the S3 file configuration which is used in the csv/tsv file to upload. By default, Double Quote is selected in the S3 file configuration, but users can change to Single Quote as required.
  • Escape character: It is used to include special character in the value and it is placed before special character in the value. It is recommended to use Backslash in the csv/tsv file to avoid any data discrepancy in the S3 Dataset.
  • Compression Type: Use .bzip or .gzip as required.
  1. When all the configurations are completed in this section, click PREVIEW. This fetches the first 10 records from the source file and displays them in the Preview Results window.
  2. Click LOAD COLUMN DETAILS. The Columns section appears as shown in the section below.

Columns

When you click LOAD COLUMN DETAILS in the S3 File Configuration section, following columns appear:

  • Column Header: Column headers that are fetched from csv/tsv file in the S3 bucket.
  • Data Type: You can assign a data type to the field that is going to be created in the S3 Dataset. Gainsight supports creating fields of five data types:
    • String
    • Number
    • Boolean
    • Date
    • DateTime

Note: By default, new fields that are going to be created in the S3 dataset are assigned with String data type.

  • Output Field Label: You can assign a field label in the S3 dataset into which values from csv/tsv file are fetched. You can assign the number of decimals for the Number values.

If there are many fields for ingestion, you can search and select the required fields from the Search box. You can select or remove the source file column headers using the checkbox next to the Column Header name. You can do this when the source file in the S3 bucket is updated (column headers are removed).

S3 File Configuration_Sequence.gif

Limitations

Following of the data type mappings between CSV column headers and S3 dataset fields fail:

  • String to Number
  • DateTime to Date
  • Date to DateTime
  • String to Boolean: If value in the CSV column is not True, it loads the value as False in the S3 Dataset field.

Date Configuration

There are three configurations Date, DateTime, and Timezone available in this section. Select these formats for the records to be ingested as similar to the Date, DateTime, and Timezone formats in the csv/tsv file records in the S3 bucket. To check formats of the source csv/tsv file records, view Preview dialog in the S3 File Configuration section. Following are the use cases for these configurations:

  • Date Format: Select yyyy-MM-dd if the format of the date values in the source file is 2018-02-18. Following are the date formats supported in the S3 dataset task from a csv/tsv file.

Supported formats

  • dd-MMM-yy
  • dd-MMM-yyyy
  • dd/MM/yy
  • dd/MM/yyyy
  • EEEE, dd MMMM  yyyy
  • M-d-yyyy
  • M-dd-yy
  • M/d/yy
  • M/d/yyyy
  • M/dd/yy
  • M/dd/yyyy
  • MM-dd-yyyy
  • MM/dd/yyyy
  • MMMM d, yyyy
  • yyyy-M-d
  • yyyy-MM-dd
  • yyyy/M/d
  • yyyyMMdd
  • DateTime Format: Select yyyy-MM-dd’T’HH:mm:ss[‘.’SSS][XXXXX] if the format of the datetime values in the source file is 2017-03-22T12:37:23-08:00. Following are the DateTime formats supported in the S3 dataset task from a csv/tsv file.

Supported formats 

  • dd/MM/yyyy hh:mm:ss['.'SSS][XXXXX] a
  • yyyy/M/dd'T'HH:mm:ss['.'SSS][XXXXX]
  • yyyy-M-dd'T'HH:mm:ss['.'SSS][XXXXX]
  • yyyy/MM/dd'T'HH:mm:ss['.'SSS][XXXXX]
  • yyyy-MM-dd'T'HH:mm:ss['.'SSS][XXXXX]
 
  • Time Zone: By default, UTC time zone is selected in this field. If the datetime value (Ex: 2017-03-22T12:37:23-08:00) in the source file has a time zone (Ex: -08:00), select the same time zone (GMT-08:00) Pacific Standard Time (America/Los_Angeles) in this field. If the datetime values in the source file do not have any timezone, select UTC in this field.

Data loading into the S3 dataset fails if the Date and DateTime formats selected in the Date Configuration section is not the same as in the source.

Mappings.gif
Once you set up configurations in the above three sections, click SAVE in the Setup Rule header to save the S3 dataset successfully.

Gainsight Managed Bucket.png

Once the S3 dataset is saved, you cannot make any changes in the configurations. You can only remove the column headers after the dataset is saved. You can do this when the source file in the S3 bucket is updated (column headers are removed).

If you have feedback or questions on this feature, please share them on community.gainsight.com.

Edit S3 File Configuration

You can also edit an S3 Configuration section, even after you save the rule. To make changes in the S3 File Configuration:

  1. Click the EDIT button.

  2. Change the csv/tsv file, switch to another S3 bucket or make any other changes in the S3 File configuration.

  3. Click Load Column Details when you are done with your changes. The fields in the Columns section are automatically updated based on the columns in the new csv/tsv file.

  4. Click SAVE to save changes in the latest S3 File Configuration. 

S3 dataset task.gif

Important 

  • If you make changes to an S3 dataset task before Rule execution starts, the rule is executed successfully.

  • If you make changes to the S3 File Configuration section during Rule execution, execution fails at that point. Records which were processed before the changes, remain intact. But no further records are processed after you change the csv/tsv file or S3 bucket. To learn more about how to view successfully ingested records and records which are failed to ingest, refer the <<Partial Success Execution Status>>(TBA) section of the Scheduling Rules article.

  • If you make changes to the S3 File Configuration section when the rule  execution has just started or is in the execution queue, system displays an error message stating that file not found or Bucket not found, based on the changes made. The rule execution fails.

  • Once you ingest data through S3 dataset, you can create an Action on the Rule to load data. If a csv/tsv file column is part of Rule Action and you wish to delete this column, from the Columns section, the system sends an error message. You must first remove the required column from the Rule Action page and then remove it from the Columns section of the S3 Dataset task.
  • Was this article helpful?