Skip to main content
Gainsight Inc.

S3 Dataset Task in Bionic Rules

This tutorial provides guidelines on how Admins can create a dataset by fetching data from a csv/tsv file in any bucket. These csv/tsv files can be uploaded manually into the S3 bucket or exported into the S3 bucket from another Bionic Rule task. This task allows Admins to create fields in the new dataset from the csv/tsv columns. Once the dataset is created, Admins can apply other tasks and set up actions on it.

This task can fetch data from a csv/tsv file that was exported to the Gainsight managed S3 bucket from another Bionic rule task. For more information, refer Export to S3 from Bionic Rules. For a short video demo about executing an S3 dataset rule after uploading a file to S3, click here.

Following are the use cases where you can use this Bionic Rule task:

  • You can fetch data from an historical csv/tsv file in the Gainsight managed or any S3 bucket if you know the exact file name, using the Equals option in the task configuration. (Ex: CompanyDetails.csv or CompanyDetails-2018-02-20.csv). If you know the format of the file name (which contains the generated date in the file name) that was exported to S3 bucket, you can select a file that was generated on a specific date using the Date Pattern option in this task configuration.
  • While exporting output csv/tsv files into the S3 bucket from another Bionic Rule task, Admins could have configured to export a set of smaller files instead of a big file (Ex: larger than 10 GB). In this use case, use the option Starts with in the task configuration and fetch data from all of the smaller files in the S3 bucket and load into the S3 Dataset.

For more details on the task configurations, refer the section S3 File Configuration.

Prerequisites

  1. To create an S3 dataset from a csv/tsv file loaded into an S3 bucket, create a connection in the Connectors 2.0 page as shown below:
    1. Navigate to Administration > Connectors 2.0 > Connectors tab.
    2. Click Create Connection on the Amazon Web services S3 Connector. Create a Connection dialog appears.
    3. Enter the following:
  • Connection Name: Example, My S3 Custom Bucket
  • Credentials of your S3 bucket like Bucket Name, Access Key, and Security Token

    Make sure that you enter valid credentials to establish an S3 connection.
  1. Click TEST CONNECTION to validate the S3 connection. If the credentials for S3 bucket are correct and valid, it displays the success message, Connection Successful. If not, it displays an error message, Connection failed.
  2. Click CREATE. A connection for your S3 bucket is established and the same appears in the S3 File Configuration section in the Setup Rule page for S3 Dataset.

S3 Connection in Connectors 2.0.gif

  1. Before an S3 dataset is created, make sure that the source csv/tsv file is available in the Gainsight Managed S3 bucket.
  2. Ensure that Date and DateTime values in the csv/tsv file are in one of the formats listed in the Date Configuration section.

Creating an S3 Dataset Task

To create a S3 Dataset Task:

  1. Navigate to Administration > Rules Engine. Click + RULE. Create Rule page appears.
  2. Provide the following in the Create Rule page:
    1. Select Bionic as Rule Type.
    2. Select Account as Rule For.
    3. Enter Rule Name.
    4. Enter Description [Optional].
  3. Click NEXT. Setup Rule page appears.
  4. Click +TASK > S3 Dataset. S3 Dataset configuration page appears. This page has three sections, S3 File Configuration, Columns, and Date Configuration.

Setup Rule.gif

  1. Enter the Task Name, Task Description, and Output Dataset Name in the Setup Rule header.

Note: Output Dataset Name auto populates from Task Name and it can be changed to a different name. It has no dependency on the Task Name.

In this use case, the following details are used:

  • Task Name: Extract from S3 [Maximum 80 characters and should be Alphanumeric; _ and space are supported]
  • Task Description: Extract data from a file in the S3 bucket [Maximum 200 characters]
  • Output Dataset Name: Extract from S3 [Maximum 60 characters and should be Alphanumeric; _ and space are supported]. This gets pre-populated with the task name by default.

S3_Dataset.png

The S3 Dataset page has three sections, S3 File Configuration, Columns, and Date Configuration as shown below. Admins can configure this page to setup an S3 extract job. Depending on this configuration, columns in the csv/tsv file are extracted to fields in the S3 dataset.

S3 File Configuration

Perform the following configurations in this section:

  1. Select correct S3 bucket: 
  • Gainsight Managed, if you want to fetch a csv/tsv file from the Gainsight Managed bucket
  • Your S3 custom bucket, if you want to fetch a csv/tsv file from your S3 custom bucket

Note: You can see an S3 bucket here for which you have established an S3 connection in the Connectors 2.0 page as shown in the Prerequisites.

For a quick demo on how to create a connection for an S3 bucket and use the connection in this section, watch this short video.

  1. In the File Path field, you have the following options for the file name:
    1. Equals: If you know the exact file name in the S3 bucket with csv/tsv extension (Example: CompanyDetails.csv or CompanyDetails-2018-02-20.csv), use the option Equals to select that specific file for data loading into the S3 dataset. You can use this option to load an historical file from the S3 bucket. If your csv/tsv file has .gzip or .bzip as extension, select the specific Compression Type in the file properties. Select the other file properties as required for smooth data loading. However, Gainsight supports the following file properties:
  • Field Separator: , (Comma) as separator for the csv files and (Tab) for the tsv files
  • Text Qualifier: “ (Double Quote) or ‘ (Single Quote)
  • Escape Character: \ (Backslash), ‘ (Single Quote), or “ (Double Quote)
  • Compression Type: gzip or bzip
  1. Starts with: You can use this option only when you select the Use Date Pattern checkbox. This option can be used in combination with Date Pattern in the file name. If you enter the partial file name with date pattern, the specific file is selected for creating S3 dataset. You should select the file properties like Field Separator and Compression Type to select the correct file from the S3 bucket. You can use this option if you want to ingest data from multiple files of similar file names into the S3 Dataset.

    While exporting output csv/tsv files into the S3 bucket from another Bionic Rule task, if the file size is big (Ex: larger than 10 GB), Admins can configure to divide this big file into a set of smaller files with similar file names (Ex: file 1 of 1.5 GB, file 2 of 2 GB, etc.) and export into the S3 bucket. These smaller files have the same columns and the records are divided into multiple files. While creating an S3 Dataset using the option Starts with, records from all of the smaller files are fetched into the S3 dataset.
  1. Use Date Pattern: Enable this to use Date Pattern in the file path. If you select this, ${pattern} appears in the file name and it can be placed anywhere in the file path. Enter the number of days to subtract from the rule date in the ${pattern} formula. Date pattern in the file name can be applied with both the Equals and Starts with options in the File Path field.

Note: Position of the Date Pattern in the file name should correspond with the name of the files exported to the S3 bucket.

  1. You can use the following combination of File Path and Date Pattern options for the mentioned use cases:

File Path: In all the below conditions, make sure that you enter full file path from S3 bucket. For example, if a file named "CompanyDetails.csv" was placed in the folder MDA-Data-Ingest/input/ of a given S3 Bucket, and if you use the option Equals, the file path would be MDA-Data-Ingest/input/CompanyDetails.csv.

  1. Equals: You can use this option only if you know the exact file name (with/without file generated date) in the S3 bucket with csv/tsv extension (Example: CompanyDetails.csv or CompanyDetails-2018-02-20.csv). The file path would be MDA-Data-Ingest/input/CompanyDetails.csv.
  2. Equals + Date Pattern: You can use this option if you know the file name without the file generated date. You can add Date Pattern to select a specific file for data ingestion. For example, if the Rule execution date is 2018-02-24 and you want to ingest data from a file exported to S3 bucket on 2018-02-20. You can subtract 4 days from Rule Date. The file name will be CompanyDetails-${pattern}.csv where ${pattern} = Subtract 4 days from Rule Date with format yyyy-MM-dd and the file path will be MDA-Data-Ingest/input/CompanyDetails-${pattern}.csv.
  3. Starts with + Date Pattern: You can use this option if you know the file name partially without file generated date. You can add Date Pattern to select a file or a set of files that match the file path for data ingestion. For example, if the Rule execution date is 2018-02-24 and if you want to ingest data from a file or multiple files that match the file path exported to S3 bucket on 2018-02-20. You can subtract 4 days from the Rule Date. The file name will be CompanyDetails-${pattern} where ${pattern} = Subtract 4 days from Rule Date with format yyyy-MM-dd and the file path will be MDA-Data-Ingest/input/CompanyDetails-${pattern}.

S3 File Configuration.gif

  1. When you select the option Equals, make sure to add the extensions .csv/.tsv and .bzip/.gz for the file format (Comma separated/Tab separated) and compression type respectively.
  2. Is the source file Encrypted: If your csv file in the S3 bucket is encrypted, select the checkbox and correct PGP key to apply decryption on the csv/tsv file.

Note: To configure a PGP key for decryption with Gainsight, contact support@gainsight.com so that you can use the same here.

  1. When you select the option Equals or Starts with, make sure to select the following File Properties as required:
  • Field Separator: Comma / Tab for the .csv/.tsv files respectively
  • Compression Type: bzip / gzip for the file compression types respectively. Select None if the compression is not applied on the source file in the S3 bucket.

If you do not apply the above configurations correctly or you have provided incorrect S3 bucket details while creating an S3 connection in the Connectors 2.0 page, the Rule execution fails.

Notes:

  • Field Separator: Use , (Comma) as separator for the csv files and (Tab) for the tsv files.
  • Text Qualifier: It is used to import a value (along with special characters) specified in the Quotation while importing data. It is recommended to use the same Text Qualifier in the S3 file configuration which is used in the csv/tsv file to upload. By default, Double Quote is selected in the S3 file configuration, but users can change to Single Quote as required.
  • Escape character: It is used to include special character in the value and it is placed before special character in the value. It is recommended to use Backslash in the csv/tsv file to avoid any data discrepancy in the S3 Dataset.
  • Compression Type: Use .bzip or .gzip as required.
  1. When all the configurations are completed in this section, click PREVIEW. This fetches the first 10 records from the source file and displays them in the Preview Results window.
  2. Click LOAD COLUMN DETAILS. The Columns section appears as shown in the section below.

Columns

When you click LOAD COLUMN DETAILS in the S3 File Configuration section, following columns appear:

  • Column Header: Column headers that are fetched from csv/tsv file in the S3 bucket.
  • Data Type: The Date Type is selected automatically for each column header. However, you can modify this field and change the default Data Type for the required column headers. Ensure that the Data Types of Column Header and its corresponding Output Field Label are same. Gainsight supports six Data Types:
    • Boolean: This Data Type can hold either a True/False value.
    • Date: This Data Type can hold Date values. 
    • DateTime: This Data Type can hold Date and Time values.
    • Number: This Data Type can hold integers and floating point values.
    • String: This Data Type can hold characters and String values.
    • GSID: This Data Type can hold values of the GSID type. GSID is a 36 digit unique ID assigned to every record in Gainsight.
      Example of GSID: You can export records from an S3 Dataset. The exported records can have fields like Company ID which is a GSID field. When you import the same records through an S3 dataset, Company ID field can be automatically mapped as GSID. 

When you select Date and Date Time data types for a Column header, a settings icon is displayed. You can use this icon to set a Date or date time format only for that Column header. if you do not select a Date or DateTime format for a Date or Datetime data type column, the  

C S3 - enhancement.gif

Note: If you do not select a format for Date, DateTime fields or Timezone, the format or Time Zone specified in the Default Date Configuration section is selected.

Defau;lt date time.png

  • Output Field Label: You can assign a field label in the S3 dataset into which values from csv/tsv file are fetched. You can assign the number of decimals for the Number values.

If you have dropdown list items in either labels (Ex: Active, Inactive, etc.) or their corresponding GSIDs in a CSV header, system allows you to store these values in the columns of an S3 dataset. You can use these values to match with dropdown list items in MDA (Ex: Company > Status) in the action types: Load to Gainsight objects.

If there are many fields for ingestion, you can search and select the required fields from the Search box. You can select or remove the source file column headers using the checkbox next to the Column Header name. You can do this when the source file in the S3 bucket is updated (column headers are removed).

Columns.gif

Limitations

Following limitation is applied in the Columns section:

  • Following of the data type mappings between CSV column headers and S3 dataset fields fail:

    • String to Number

    • DateTime to Date

    • Date to DateTime

    • String to Boolean: If value in the CSV column is not True, it loads the value as False in the S3 Dataset field.

Edit S3 File Configuration

You can also edit an S3 Configuration section, even after you save the rule. To make changes in the S3 File Configuration:

  1. Click the EDIT button.

  2. Change the csv/tsv file, switch to another S3 bucket or make any other changes in the S3 File configuration.

  3. Click Load Column Details when you are done with your changes. The fields in the Columns section are automatically updated based on the columns in the new csv/tsv file.

  4. Click SAVE to save changes in the latest S3 File Configuration. 

    S3 File Configuration edit.gif

Important 

  • If you make changes to an S3 dataset task before Rule execution starts, the rule is executed successfully.

  • If you make changes to the S3 File Configuration section during Rule execution, execution fails at that point. Records which were processed before the changes, remain intact. But no further records are processed after you change the csv/tsv file or S3 bucket. To learn more about how to view successfully ingested records and records which are failed to ingest, refer the Partial Success Execution Status section of the Scheduling Rules article.

  • If you make changes to the S3 File Configuration section when the rule  execution has just started or is in the execution queue, system displays an error message stating that file not found or Bucket not found, based on the changes made. The rule execution fails.

  • Once you ingest data through S3 dataset, you can create an Action on the Rule to load data. If a csv/tsv file column is part of Rule Action and you wish to delete this column, from the Columns section, the system sends an error message. You must first remove the required column from the Rule Action page and then remove it from the Columns section of the S3 Dataset task.

Date Configuration

There are three configurations Date, DateTime, and Timezone available in this section. Select these formats for the records to be ingested as similar to the Date, DateTime, and Timezone formats in the csv/tsv file records in the S3 bucket. To check formats of the source csv/tsv file records, view Preview dialog in the S3 File Configuration section. Following are the use cases for these configurations:

  • Date Format: Select yyyy-MM-dd if the format of the date values in the source file is 2018-02-18. Following are the date formats supported in the S3 dataset task from a csv/tsv file. 

    Supported formats (click here to expand the list)
    dd-MMM-yy
    dd-MMM-yyyy
    dd/MM/yy
    dd/MM/yyyy
    EEEE, dd MMMM yyyy
    M-d-yyyy
    M-dd-yy
    M/d/yy
    M/d/yyyy
    M/dd/yy
    M/dd/yyyy
    MM-dd-yyyy
    MM/dd/yyyy
    MMMM d, yyyy
    yyyy-M-d
    yyyy-MM-dd
    yyyy/M/d
    yyyyMMdd
    yyyy/MM/dd
    M-dd-yyyy
    MM-d-yyyy
    MM/d/yyyy
    MM-d-yy
    MM/d/yy
    M-d-yy
     
  • DateTime Format: Select yyyy-MM-dd’T’HH:mm:ss[‘.’SSS][XXXXX] if the format of the datetime values in the source file is 2017-03-22T12:37:23-08:00. Following are the DateTime formats supported in the S3 dataset task from a csv/tsv file.

    Supported formats (click here to expand the list)
    dd/MM/yyyy hh:mm:ss['.'SSS][XXXXX] a
    yyyy/M/dd'T'HH:mm:ss['.'SSS][XXXXX]
    yyyy-M-dd'T'HH:mm:ss['.'SSS][XXXXX]
    yyyy/MM/dd'T'HH:mm:ss['.'SSS][XXXXX]
    yyyy-MM-dd'T'HH:mm:ss['.'SSS][XXXXX]
    M-d-yy HH:mm
    M-d-yyyy HH:mm
    M-dd-yy HH:mm
    M-dd-yyyy HH:mm
    M/d/yy HH:mm
    M/d/yyyy HH:mm
    M/dd/yy HH:mm
    M/dd/yyyy HH:mm
    MM-d-yy HH:mm
    MM-d-yyyy HH:mm
    MM-dd-yy HH:mm
    MM-dd-yyyy HH:mm
    MM-dd-yyyy HH:mm:ss
    MM/d/yy HH:mm
    MM/d/yyyy HH:mm
    MM/dd/yy HH:mm
    MM/dd/yyyy HH:mm
    MM/dd/yyyy HH:mm
    MM/dd/yyyy HH:mm:ss
    yyyy-MM-dd HH-mm-ss
    yyyy-MM-dd HH:mm:ss
    yyyy-MM-dd HH:mm:ss.S
    yyyy-MM-dd HH:mm:ssZ
    yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
    yyyy-MM-dd'T'HH:mm:ss.SSSZ
    yyyy-MM-dd'T'HH:mm:ss.SSS
    yyyy-MM-dd'T'HH:mm:ssz
    yyyy-MM-dd'T'HH:mm:ssZ
    yyyy-MM-dd:HH-mm-ss
  • Time Zone: By default, UTC time zone selected in this field. If the datetime value (Ex: 2017-03-22T12:37:23-08:00) in the source file has a time zone (Ex: -08:00), select the same time zone (GMT-08:00) Pacific Standard Time (America/Los_Angeles) in this field. If the datetime values in the source file do not have any timezone, select UTC in this field.

Data loading into the S3 dataset fails if the Date and DateTime formats selected in the Date Configuration section is not the same as in the source.

Date Configurations.gif

Once you set up configurations in the above three sections, click SAVE in the Setup Rule header to save the S3 dataset successfully.

Save button.png

Once the S3 dataset is saved, you cannot make any changes in the configurations. You can only remove the column headers after the dataset is saved. You can do this when the source file in the S3 bucket is updated (column headers are removed).

If you have feedback or questions on this feature, please share them on community.gainsight.com.

 

  • Was this article helpful?