File format: CBOR

CBOR is a binary data serialization format loosely based on JSON. Like JSON it allows the transmission of data objects that contain name–value pairs, but in a more concise manner. This increases processing and transfer speeds at the cost of human-readability.

Note

XML data can be sent to the Streaming Ingest API, after which it is converted to CBOR format and may be loaded to Amperity using an ingest query that flattens it into a tabular format.

Pull CBOR files

To pull CBOR files to Amperity:

  1. Select a filedrop data source or identify the location at which the Streaming Ingest API has put the CBOR file.

  2. Use an ingest query to select fields from the CBOR file to pull to Amperity.

  3. Configure a courier for the location and name of the CBOR file, and then for the name of an ingest query.

  4. Define a feed to associate the fields that were selected from the CBOR file with semantic tags for customer profiles and interactions, as necessary.

Data sources

Pull CBOR files to Amperity using any filedrop data source:

Ingest queries

An ingest query is a SQL statement that may be applied to data prior to loading it to a domain table. An ingest query is defined using Spark SQL syntax.

Use Spark SQL to define an ingest query for the CBOR file. Use a SELECT statement to specify which fields should be pulled to Amperity. Apply transforms to those fields as necessary.

Couriers

A courier brings data from an external system to Amperity.

A courier must specify the location of the CBOR file, and then define how that file is to be pulled to Amperity. This is done using a combination of configuration blocks:

  1. Load settings

  2. Load operations

Load settings

Use courier load settings to specify the path to the CBOR file, a file tag (which can be the same as the name of the CBOR file), and the "application/ingest-pack+cbor" content type.

for Amazon AWS
{
  "object/type": "file",
  "object/file-pattern": "'ingest/stream/TENANT/STREAM_ID/'yyyy-MM-dd'/'*'.cbor'",
  "object/land-as": {
     "file/header-rows": 1,
     "file/tag": "FILE_NAME",
     "file/content-type": "application/ingest-pack+cbor"
  }
},
for Microsoft Azure
{
  "object/type": "file",
  "object/file-pattern": "'STREAM_ID/'yyyy-MM-dd'/'*'.cbor'",
  "object/land-as": {
     "file/header-rows": 1,
     "file/tag": "FILE_NAME",
     "file/content-type": "application/ingest-pack+cbor"
  }
},

Load operations

Use courier load operations to associate a feed ID to the courier, apply the same file tag as the one used for load settings, the element within the CBOR schema to be treated as a row in a table, and the name of the ingest query.

{
  "FEED_ID": [
    {
      "type": "spark-sql",
      "spark-sql-files": [
        {
          "file": "FILE_NAME",
          "options": {
            "rowTag": "row"
          },
          "schema": {
            "fields": [
              {
                "metadata": {},
                "name": "field-1",
                "type": "string",
                "nullable": true
              },
              ...
              {
                "metadata": {},
                "name": "nested-group-1",
                "type": {
                  "fields": [
                    {
                      "metadata": {},
                      "name": "field-a",
                      "type": "string",
                      "nullable": true
                    },
                    {
                      "metadata": {},
                      "name": "nested-group-a",
                      "type": {
                        "fields": [
                          ...
                        ],
                        "type": "struct"
                      },
                      "nullable": true
                    },
                    {
                      "metadata": {},
                      "name": "field-xyz",
                      "type": "string",
                      "nullable": true
                    },
                  ],
                  "type": "struct"
                }
                "type": "struct"
              }
              ...
            }
            ...
          ]
        }
      ],
      "spark-sql-query": "INGEST_QUERY_NAME"
    }
  ]
}

Important

The "schema" must match the structure of the incoming file, including all nested groupings and data types. Set "nullable" to True to allow fields to contain NULL values. A CBOR file can have hundreds of fields. The ellipses (...) in this example represents locations within this example structure where additional fields may be present.

Tip

Set rowTag to the element in the CBOR file that should be treated as a row in a table. The default value is row.

Feeds

A feed defines how data should be loaded into a domain table, including specifying which columns are required and which columns should be associated with a semantic tag that indicates that column contains customer profile (PII) and transactions data.

Apply profile (PII) semantics to customer records and transaction, and product catalog semantics to interaction records. Use blocking key (bk), foreign key (fk), and separation key (sk) semantic tags to define how Amperity should understand how field relationships should be understood when those values are present across your data sources.

Send CBOR files

Important

Amperity does not send CBOR files to downstream workflows.