Skip to main content
Gainsight Inc.

S3 Dataset Task in Bionic Rules

This tutorial provides guidelines on how Admins can create a dataset by fetching data from a csv/tsv file in the Gainsight managed bucket in S3. These csv/tsv files can be uploaded manually into the S3 bucket or exported into the S3 bucket from another Bionic Rule task. This task allows Admins to create fields in the new dataset from the csv/tsv columns. Once the dataset is created, Admins can apply other tasks and set up actions on it.

This task can fetch data from a csv/tsv file that was exported to the Gainsight managed S3 bucket from another Bionic rule task. For more information, refer Export to S3 from Bionic Rules

Following are the use cases where you can use this Bionic Rule task:

  • You can fetch data from an historical csv/tsv file in the Gainsight managed S3 bucket if you know the exact file name, using the Equals option in the task configuration. (Ex: CompanyDetails.csv or CompanyDetails-2018-02-20.csv). If you know the format of the file name (which contains the generated date in the file name) that was exported to S3 bucket, you can select a file that was generated on a specific date using the Date Pattern option in this task configuration.
  • While exporting output csv/tsv files into the S3 bucket from another Bionic Rule task, Admins could have configured to export a set of smaller files instead of a big file (Ex: larger than 10 GB). In this use case, use the option Starts with in the task configuration and fetch data from all of the smaller files in the S3 bucket and load into the S3 Dataset.

For more details on the task configurations, refer the section S3 File Configuration.

Prerequisite

Before an S3 dataset is created, make sure that the source csv/tsv file is available in the Gainsight Managed S3 bucket.

Creating a S3 Dataset Task

To create a S3 Dataset Task:

  1. Navigate to Administration > Rules Engine. Click + RULE. Create Rule page appears.
  2. Provide the following in the Create Rule page:
    1. Select Bionic as Rule Type.
    2. Select Account as Rule For.
    3. Enter Rule Name.
    4. Enter Description [Optional].
  3. Click NEXT. Setup Rule page appears.
  4. Click +TASK > S3 Dataset. S3 Dataset configuration page appears. This page has three sections, S3 File Configuration, Columns, and Date Configuration.

Setup Rule.gif

  1. Enter the Task Name, Task Description, and Output Dataset Name in the Setup Rule header.

Note: Output Dataset Name auto populates from Task Name and it can be changed to a different name. It has no dependency on the Task Name.

In this use case, the following details are used:

  • Task Name: Extract from S3 [Maximum 80 characters and should be Alphanumeric; _ and space are supported]
  • Task Description: Extract data from a file in the S3 bucket [Maximum 200 characters]
  • Output Dataset Name: Extract from S3 [Maximum 60 characters and should be Alphanumeric; _ and space are supported]. This gets pre-populated with the task name by default.

S3_Dataset.png

The S3 Dataset page has three sections, S3 File Configuration, Columns, and Date Configuration as shown below. Admins can configure this page to setup an S3 extract job. Depending on this configuration, columns in the csv/tsv file are extracted to fields in the S3 dataset.

S3 File Configuration

Perform the following configurations in this section:

  1. Select Gainsight Managed as Gainsight Bucket.
  2. In the File Path field, you have the following options:
    1. Equals: If you know the exact file name in the S3 bucket with csv/tsv extension (Ex: CompanyDetails.csv or CompanyDetails-2018-02-20.csv), use the option Equals to select that specific file for data loading into the S3 dataset. You can use this option to load an historical file from the S3 bucket. If your csv/tsv file has .gzip or .bzip as extension, select the specific Compression Type in the file properties. Select the other file properties as required for smooth data loading. However, Gainsight supports the following file properties:
  • Field Separator: , (Comma) as separator for the csv files and (Tab) for the tsv files
  • Text Qualifier: “ (Double Quote) or ‘ (Single Quote)
  • Escape Character: \ (Backslash), ‘ (Single Quote), or “ (Double Quote)
  • Compression Type: gzip or bzip
  1. Starts with: You can use this option only when you select the Use Date Pattern checkbox. This option can be used in combination with Date Pattern in the file name. If you enter the partial file name with date pattern, the specific file is selected for creating S3 dataset. You should select the file properties like Field Separator and Compression Type to select the correct file from the S3 bucket. You can use this option if you want to ingest data from multiple files of similar file names into the S3 Dataset.

    While exporting output csv/tsv files into the S3 bucket from another Bionic Rule task, if the file size is big (Ex: larger than 10 GB), Admins can configure to divide this big file into a set of smaller files with similar file names (Ex: file 1 of 1.5 GB, file 2 of 2 GB, etc.) and export into the S3 bucket. These smaller files have the same columns and the records are divided into multiple files. While creating an S3 Dataset using the option Starts with, records from all of the smaller files are fetched into the S3 dataset.
  1. Use Date Pattern: Enable this to use Date Pattern in the file path. If you select this, ${pattern} appears in the file name and it can be placed anywhere in the file path. Enter the number of days to subtract from the rule date in the ${pattern} formula. Date pattern in the file name can be applied with both the Equals and Starts with options in the File Path field.

Note: Position of the Date Pattern in the file name should correspond with the name of the files exported to the S3 bucket.

  1. You can use the following combination of File Path and Date Pattern options for the mentioned use cases:

    1. Equals: You can use this option if you know the exact file name (with/without file generated date) in the S3 bucket with csv/tsv extension (Ex: CompanyDetails.csv or CompanyDetails-2018-02-20.csv).
    2. Equals + Date Pattern: You can use this option if you know the file name without the file generated date. You can add Date Pattern to select a specific file for data ingestion. For example, if the Rule execution date is 2018-02-24 and you want to ingest data from a file exported to S3 bucket on 2018-02-20. You can subtract 4 days from Rule Date. The file path will be CompanyDetails-${pattern}.csv where ${pattern} = Subtract 4 days from Rule Date with format yyyy-MM-dd.
    3. Starts with + Date Pattern: You can use this option if you know the file name partially without file generated date. You can add Date Pattern to select a file or a set of files that match the file path for data ingestion. For example, if the Rule execution date is 2018-02-24 and if you want to ingest data from a file or multiple files that match the file path exported to S3 bucket on 2018-02-20. You can subtract 4 days from Rule Date. The file path will be CompanyDetails-${pattern} where ${pattern} = Subtract 4 days from Rule Date with format yyyy-MM-dd.

S3 File Configuration.gif

  1. When you select the option Equals, make sure to add the extensions .csv/.tsv and .bzip/.gz for the file format (Comma separated/Tab separated) and compression type respectively.
  2. When you select the option Equals or Starts with, make sure to select the following File Properties as required:
  • Field Separator: Comma / Tab for the .csv/.tsv files respectively
  • Compression Type: bzip / gzip for the file compression types respectively. Select None if the compression is not applied on the source file in the S3 bucket.

If you do not apply the above configurations correctly, the Rule execution fails.

Notes:

  • Field Separator: Use , (Comma) as separator for the csv files and (Tab) for the tsv files.
  • Text Qualifier: It is used to import a value (along with special characters) specified in the Quotation while importing data. It is recommended to use the same Text Qualifier in the S3 file configuration which is used in the csv/tsv file to upload. By default, Double Quote is selected in the S3 file configuration, but users can change to Single Quote as required.
  • Escape character: It is used to include special character in the value and it is placed before special character in the value. It is recommended to use Backslash in the csv/tsv file to avoid any data discrepancy in the S3 Dataset.
  • Compression Type: Use .bzip or .gzip as required.
  1. When all the configurations are completed in this section, click Preview. This fetches the first 10 records from the source file and displays them in the Preview dialog:

Preview Results.gif

  1. Click Load Column Details. The Columns section appears as shown in the section below.

Columns

When you click Load Columns Details in the S3 File Configuration section, following columns appear:

  • Column Header: Column headers that are fetched from csv/tsv file in the S3 bucket.
  • Data Type: You can assign a data type to the field that is going to be created in the S3 Dataset. Gainsight supports creating fields of five data types:
    • String
    • Number
    • Boolean
    • Date
    • DateTime

Note: By default, new fields that are going to be created in the S3 dataset are assigned with String data type.

  • Output Field Label: You can assign a field label in the S3 dataset into which values from csv/tsv file are fetched. You can assign the number of decimals for the Number values.

If there are many fields for ingestion, you can search and select the required fields from the Search box. You can select or remove the source file column headers using the checkbox next to the Column Header name. You can do this when the source file in the S3 bucket is updated (column headers are removed).

Columns.gif

Limitations

Some limitations apply in the Columns section as listed below:

  • Following of the data type mappings between CSV column headers and S3 dataset fields fail:

    • String to Number

    • DateTime to Date

    • Date to DateTime

    • String to Boolean: If value in the CSV column is not True, it loads the value as False in the S3 Dataset field.

  • Currently, fetching files from custom bucket in S3 is not supported.
  • Currently, files containing Encryption are not supported to fetch data into the S3 Dataset.

Date Configuration

There are three configurations Date, DateTime, and Timezone available in this section. Select these formats for the records to be ingested as similar to the Date, DateTime, and Timezone formats in the csv/tsv file records in the S3 bucket. To check formats of the source csv/tsv file records, view Preview dialog in the S3 File Configuration section. Following are the use cases for these configurations:

  • Date Format: Select yyyy-MM-dd if the format of the date values in the source file is 2018-02-18.
  • DateTime Format: Select yyyy-MM-dd’T’HH:mm:ss[‘.’SSS][XXXXX] if the format of the datetime values in the source file is 2017-03-22T12:37:23-08:00.

  • Time Zone: By default, UTC time zone selected in this field. If the datetime value (Ex: 2017-03-22T12:37:23-08:00) in the source file has a time zone (Ex: -08:00), select the same time zone (GMT-08:00) Pacific Standard Time (America/Los_Angeles) in this field. If the datetime values in the source file do not have any timezone, select UTC in this field.

Data loading into the S3 dataset fails if the Date and DateTime formats selected in the Date Configuration section is not the same as in the source.

Date Configurations.gif

Once you set up configurations in the above three sections, click SAVE in the Setup Rule header to save the S3 dataset successfully.

Save button.png

Once the S3 dataset is saved, you cannot make any changes in the configurations. You can only remove the column headers after the dataset is saved. You can do this when the source file in the S3 bucket is updated (column headers are removed).

If you have feedback or questions on this feature, please share them on community.gainsight.com.

  • Was this article helpful?