Apache Avro files¶

Apache Avro is a row-oriented remote procedure call and data serialization framework developed within the Apache Hadoop ecosystem. Avro uses JSON to define data types and protocols, and serializes data in a compact binary format.

Apache Avro may be used with any upstream or downstream customer environment that supports the use of Avro. Avro offers the most compact file format available for use with Amperity.

Pull Avro files¶

To pull Avro files to Amperity:

Select a data source.
Configure a courier for the location and name of the Avro file.
Define a feed to associate fields in the Avro file with semantic tags.

Data sources¶

Pull Apache Avro files to Amperity using one of the following data sources:

Load data¶

Use a feed to associate fields in the Apache Avro file with semantic tags and a courier to pull the Apache Avro file from its upstream data source.

Couriers
Feeds

Couriers¶

A courier brings data from an external system to Amperity.

A courier must specify the location of the Apache Avro file, and then define how that file is to be pulled to Amperity.

File settings
Feed selection

File settings¶

Use the File settings section of the courier configuration page to specify the path to the Apache Avro file and to define formattting within the file.

Feed selection¶

Use the Feed selection section of the courier configuration page to identify the feed for which this courier pulls data, and then which files are loaded.

From the Load type dropdown select one of:

Load Use this option to load data to the associated domain table.
Spark Use this option to load data when the Apache Avro file contains complex types, such as records, enums, arrays, maps, unions, and fixed.
Truncate and load Use this option to delete all rows in the associated domain table, and then load data.

Feeds¶

A feed defines how to load data into a domain table, including specifying required columns and columns with semantic tags for customer profile (PII) or transactions data.

Apply profile (PII) semantics to customer records and transaction, and product catalog semantics to interaction records. Use blocking key (bk), foreign key (fk), and separation key (sk) semantic tags to define how Amperity should understand values that exist across data sources.

Send Avro files¶

Important

Amperity does not send Avro files to downstream workflows.