File format: DAT

A DAT file is a file that contains binary data, often specific to the program that created the file, that is not human readable or in a tabular (columns and rows) format.

Note

You can use an ingest query to parse individual fields out of a DAT file, and then configure a courier to load that query. Rename the fields in the ingest query, in a feed, or by using domain SQL.

Pull DAT files

To pull DAT files to Amperity:

  1. Select a filedrop data source.

  2. Use an ingest query to parse the fields in the DAT file.

  3. Configure a courier for the location and name of the DAT file, and then for the name of an ingest query.

  4. Define a feed to associate the fields that were selected from the DAT file with semantic tags for customer profiles and interactions or use domain SQL to transform this data into a shape that is usable within Amperity.

Data sources

Pull DAT files to Amperity using any filedrop data source:

Ingest queries

An ingest query is a SQL statement that may be applied to data prior to loading it to a domain table. An ingest query is defined using Spark SQL syntax.

Use Spark SQL to define an ingest query for the DAT file that parses the individual fields. This can be done using the SUBSTR() function to identify the positions for each field within the DAT file. Use other functions, such as TRIM() and NULLIF() as necessary to ensure the data that is parsed from the DAT file is as clean as possible. Use a SELECT statement to specify which fields should be pulled to Amperity. Apply transforms to those fields as necessary.

The following example shows an ingest query that parses fields from a DAT file. Each field (fields 1-6) has a starting point within the DAT file (1, 21, 52, 63, 69, 70) and a length (20, 30, 10, 15, 1, 140). Use an ordinal ( _c0 ) to define each source field within the DAT file.

SELECT
  ,NULLIF(TRIM(SUBSTR(`_c0`,1,20)),'') AS Field1
  ,NULLIF(TRIM(SUBSTR(`_c0`,21,30)),'') AS Field2
  ,NULLIF(TRIM(SUBSTR(`_c0`,52,10)),'') AS Field3
  ,NULLIF(TRIM(SUBSTR(`_c0`,63,15)),'') AS Field4
  ,NULLIF(TRIM(SUBSTR(`_c0`,69,1)),'') AS Field5
  ,NULLIF(TRIM(SUBSTR(`_c0`,70,140)),'') AS Field6
FROM DAT_FILE_NAME

Important

Verify the fields that are parsed from the DAT file to ensure they contain correct values and have column names that represent the values they contain. In addition, be sure to verify that the feed is able to apply semantic tags to customer records and/or interactions records, and that the data is represented correctly downstream within Amperity by Stitch, within the customer 360 database, and as the data may be used within queries and segments.

Couriers

A courier brings data from an external system to Amperity.

A courier must specify the location of the DAT file, and then define how that file is to be pulled to Amperity. This is done using a combination of configuration blocks:

  1. Load settings

  2. Load operations

Load settings

Use courier load settings to specify the path to the DAT file, a file tag (which can be the same as the name of the DAT file). Use the "text/csv" content type. Set "file/header-rows" to 0.

{
  "object/type": "file",
  "object/file-pattern": "'path/to/file'-YYYY-MM-dd'.dat'",
  "object/land-as": {
     "file/header-rows": 0,
     "file/tag": "FILE_NAME",
     "file/content-type": "text/csv"
  }
},

Load operations

Use courier load operations to associate a feed ID to the courier, apply the same file tag as the one used for load settings, the column defined in the ingest query to represent the DAT schema, and the name of the ingest query.

{
  "FEED_ID": [
    {
      "type": "spark-sql",
      "spark-sql-files": [
        {
          "file": "FILE_NAME",
          "schema": {
            "fields": [
              {
                "metadata": {},
                "name": "_c0",
                "type": "string",
                "nullable": true
              },
              "type": "struct"
            }
          ]
        }
      ],
      "spark-sql-query": "INGEST_QUERY_NAME"
    }
  ]
}

Important

Use a single ordinal column to define the "schema" of a DAT file.

Feeds

A feed defines how data should be loaded into a domain table, including specifying which columns are required and which columns should be associated with a semantic tag that indicates that column contains customer profile (PII) and transactions data.

Apply profile (PII) semantics to customer records and transaction, and product catalog semantics to interaction records. Use blocking key (bk), foreign key (fk), and separation key (sk) semantic tags to define how Amperity should understand how field relationships should be understood when those values are present across your data sources.

Send DAT files

Important

Amperity does not send DAT files to downstream workflows.