Format: XML

eXtensible Markup Language (XML) is supported data format for customer data sources.

Note

This topic is about standalone XML files. XML data that is sent to the Streaming Ingest REST API is converted to CBOR format.

Pull XML files

To pull XML files to Amperity:

  1. Select a filedrop data source

  2. Use a ingest query to select fields from the XML file to pull to Amperity

  3. Configure a courier for the location and name of the XML file, and then for the name of an ingest query

  4. Define a feed to associate the fields that were selected from the XML file with semantic tags for customer profiles and interactions, as necessary

Data sources

Pull XML files to Amperity using any filedrop data source:

Ingest queries

An ingest query is a SQL statement that may be applied to data prior to loading it to a domain table. An ingest query is defined using Spark SQL syntax.

Use Spark SQL to define an ingest query for the XML file. Use a SELECT statement to specify which fields should be pulled to Amperity. Apply transforms to those fields as necessary.

Explode interactions data

Note

This example uses an example XML file as the data source for sales transactions.

Use the EXPLODE() function to process sales transaction data into a table using an ingest query similar to:

WITH explodedData AS (
  SELECT
    salesTransactionId
    ,EXPLODE(salesOrder.tenders.tender) AS tender FROM PosXml
)

SELECT
  salesTransactionId
  ,tender.type AS type
  ,tender.amount AS amount
FROM
  explodedData

Couriers

A courier brings data from external system to Amperity. A courier relies on a feed to know which fileset to bring to Amperity for processing.

A courier must specify the location of the XML file, and then define how that file is to be pulled to Amperity. This is done using a combination of configuration blocks:

  1. Load settings

  2. Load operations

Load settings

Use courier load settings to specify the path to the XML file, a file tag (which can be the same as the name of the XML file), and the "application/xml" content type.

{
  "object/type": "file",
  "object/file-pattern": "'path/to/file'-YYYY-MM-dd'.xml'",
  "object/land-as": {
    "file/tag": "FILE_NAME",
    "file/content-type": "application/xml"
  }
}

Load operations

Use courier load operations to associate a feed ID to the courier, apply the same file tag as the one used for load settings, the element within the XML schema to be treated as a row in a table, and the name of the ingest query.

{
  "FEED_ID": [
    {
      "type": "spark-sql",
      "spark-sql-files": [
        {
          "file": "FILE_NAME",
          "options": {
            "rowTag": "row"
          }
        }
      ],
      "spark-sql-query": "INGEST_QUERY_NAME"
    }
  ]
}

Tip

Set ROW to the element in the XML schema that should be treated as a row in a table. For example, if the XML schema contained:

<salesTransactions>
  <salesTransaction> ... </salesTransaction>
</salesTransactions>

then use salesTransaction as the value for rowTag. The default value is row.

{
  "df-5Jagkabc": [
    {
      "type": "spark-sql",
      "spark-sql-files": [
        {
          "file": "PosData",
          "options": {
            "rowTag": "salesTransaction"
          }
        }
      ],
      "spark-sql-query": "API_Test_Headers"
    }
  ]
}

Feeds

A feed defines how data should be loaded into a domain table, including specifying which columns are required and which columns should be associated with a semantic tag that indicates that column contains customer profile (PII) and transactions data.

Apply profile (PII) semantics to customer records and transaction, itemized transaction, and product catalog to interaction records. Use blocking key (bk), foreign key (fk), and separation key (sk) semantic tags to define how Amperity should understand how field relationships should be understood when those values are present across your data sources.

Send XML files

Important

Amperity does not send XML files to downstream workflows.