File format: JSON

JavaScript Object Notation (JSON) is language-independent data format that is derived from (and structured similar to) JavaScript.

Note

This topic is about standalone JSON files. JSON data that is sent to the Streaming Ingest API is converted to NDJSON format.

Pull JSON files

To pull JSON files to Amperity:

  1. Select a filedrop data source.

  2. Use an ingest query to select fields from the JSON file to pull to Amperity.

  3. Configure a courier for the location and name of the JSON file, and then for the name of an ingest query.

  4. Define a feed to associate the fields that were selected from the JSON file with semantic tags for customer profiles and interactions, as necessary.

Data sources

Pull JSON files to Amperity using any filedrop data source:

Recommendations

When using JSON files, it is recommend to:

  • Do use simple nested data structures; do not use nested array data structures

    DO

    {
      "employee":{ "name":"John", "age":30, "city":"New York" }
    }
    

    DO NOT

    {
      "employees":[ "John", "Anna", "Peter" ]
    }
    
  • Quote string data

  • Quote date values and use the supported date format

  • Ensure numeric data is not quoted

  • Encode files in UTF-8 or UTF-16. Amperity automatically detects the 2-byte header present with the UTF-16 encoding format. If the 2-byte header is missing, the file is treated as UTF-8.

    Caution

    JSON files that are used as source data with Amperity must follow RFC 8259 , which requires using (at a minimum) the UTF-8 encoding format.

  • Compress files prior to encryption using ZIP, GZIP, and/or TAR

  • Encrypt files using PGP; compression will not reduce the size of an encrypted file

Ingest queries

An ingest query is a SQL statement that may be applied to data prior to loading it to a domain table. An ingest query is defined using Spark SQL syntax.

Use Spark SQL to define an ingest query for the JSON file. Use a SELECT statement to specify which fields should be pulled to Amperity. Apply transforms to those fields as necessary.

Couriers

A courier brings data from an external system to Amperity.

A courier must specify the location of the JSON file, and then define how that file is to be pulled to Amperity. This is done using a combination of configuration blocks:

  1. Load settings

  2. Load operations

Load settings

Use courier load settings to specify the path to the JSON file, a file tag (which can be the same as the name of the JSON file), and the "application/json" content type.

{
  "object/type": "file",
  "object/file-pattern": "'path/to/file'-YYYY-MM-dd'.json'",
  "object/land-as": {
     "file/tag": "FILE_NAME",
     "file/content-type": "application/json"
  }
},

Load operations

Use courier load operations to associate a feed ID to the courier, apply the same file tag as the one used for load settings, and the name of the ingest query.

{
  "FEED_ID": [
    {
      "type": "spark-sql",
      "spark-sql-files": [
        {
          "file": "FILE_NAME"
        }
      ],
      "spark-sql-query": "INGEST_QUERY_NAME"
    }
  ]
}

Feeds

A feed defines how data should be loaded into a domain table, including specifying which columns are required and which columns should be associated with a semantic tag that indicates that column contains customer profile (PII) and transactions data.

Apply profile (PII) semantics to customer records and transaction, and product catalog semantics to interaction records. Use blocking key (bk), foreign key (fk), and separation key (sk) semantic tags to define how Amperity should understand how field relationships should be understood when those values are present across your data sources.

Important

A feed will use the first record in a JSON file to determine its schema in the Feed Editor. If records contain optional fields and those records are not the first record you must add those fields to the feed definition manually.

Send JSON files

Amperity can send JSON files to downstream workflows using any filedrop destination: