About file formats

This topic covers details that are common to files that can be pulled to or sent from Amperity using Amazon S3, Azure Blob Storage, Google Cloud Storage, or any SFTP site.

Pull files to Amperity

The following sections apply to all files that are pulled to Amperity:

Connect to source

Amperity requires the ability to connect to, and then read data from a filedrop location. The credentials that allow that connection and the permissions to read data are entered into the Amperity user interface while configuring a courier. These credentials are created and managed by the owner of the filedrop location, which is often external to Amperity (but is sometimes a system that is owned by Amperity). The customer may need to provide credentials to Amperity using SnapPass to complete the configuration.

SnapPass allows secrets to be shared in a secure, ephemeral way. Input a single or multi-line secret, along with an expiration time, and then generate a one-time use URL that may be shared with anyone. Amperity uses SnapPass for sharing credentials to systems with customers.

Pull file formats

Amperity supports the following file formats when using a filedrop location to pull data to Amperity:

Note

Amperity can ingest data from a wide variety of data sources, such as legacy data outputs like AS/400. Ask your Amperity Support representative about formats that are not directly listed in this section to determine if those data formats can be used as a way to provide data to Amperity.

Date formats

Dates should be quoted and should be in the “yyyy-MM-dd HH:mm:ss.SSS” format. The time portion (“:mm:ss.SSS”) is optional. For example:

  • 2019-01-28 18:32:05.123

  • 2019-01-28 18:32:05

  • 2019-01-28

When the date format is not similar to the expected date format, Amperity will attempt to convert the date and time values. If date formats are mixed, Amperity will use the first one that matches.

Tip

Spark SQL may be used to transform source data into a supported date format prior to loading it to Amperity.

Date values conversion order

Date values are converted in the following order of precedence:

Date type

Format

1

basic-date

“yyyyMMdd”

2

date

“yyyy-MM-dd”

3

slash-date

“yyyy/MM/dd”

4

us-date-4y

“MM/dd/yyyy”

5

us-date-2y

“MM/dd/yy”

6

date-month-4y

“dd-MMM-yyyy”

7

date-month-2y

“dd-MMM-yy”

8

date-month-4y-spaced

“dd MMM yyyy”

9

date-month-4y-no-space

“ddMMMyyyy”

Time values conversion order

Time values (when present) are converted in the following order of precedence:

Date type

Format

1

basic-time-Z

“HHmmssZ”

2

basic-time

“HHmmss”

3

basic-time-millis-Z

“HHmmss.SSSSSSSSSZ”

4

basic-time-millis

“HHmmss.SSSSSSSSS”

5

24-hour-minute-second-millis-zone

“HH:mm:ss.SSS z”

6

24-hour-minute-second-millis-Z

“HH:mm:ss.SSSSSSSSSZ”

7

24-hour-minute-second-millis

“HH:mm:ss.SSSSSSSSS”

8

24-hour-minute-second-Z

“HH:mm:ssZ”

9

24-hour-minute-second

“HH:mm:ss”

10

24-hour-minute

“HH:mm”

11

24-hour

“HH”

12

12-hour-minute-second

“hh:mm:ss a”

13

12-hour-minute

“hh:mm a”

14

12-hour-minute-zone

“hh:mma z”

File compression / archive

Amperity supports the following compression and archiving formats:

  • Tar

  • Tgz

  • Zip

  • GZip

Large datasets

A large dataset is a file over 500GB in size.

Amperity recommends that large datasets:

  • Be provided to Amperity using Amazon S3, Azure Blob Storage, or Google Cloud Storage to use their massively parallel I/O capabilities

  • Use compression to reduce file sizes

  • In certain cases, may use the Amperity Streaming Ingest API to avoid batched data drops

Input examples

The following examples show how files input to Amperity are unpacked, depending on various combinations of encryption, compression type, and file format. All examples use yyyy_MM_dd for the date format.

for single files

PGP, TGZ, CSV

  1. Input to Amperity: table_name_yyyy_MM_dd.tgz.pgp

  2. After decryption: table_name_yyyy_MM_dd.tgz

  3. After decompression: table_name_yyyy_MM_dd.csv

PGP, GZip, TAR, CSV

  1. Input to Amperity: table_name_yyyy_MM_dd.csv.tar.gz.pgp

  2. After decryption: table_name_yyyy_MM_dd.csv.tar.gz

  3. After decompression: table_name_yyyy_MM_dd.csv.tar

  4. After the archive is opened: table_name_yyyy_MM_dd.csv

PGP, TAR, Apache Parquet

  1. Input to Amperity: table_name_yyyy_MM_dd.tar.pgp

  2. After decryption: table_name_yyyy_MM_dd.tar

  3. After decompression: table_name_yyyy_MM_dd.parquet, with 1 to n Apache Parquet part files within the directory.

for multiple files

PGP, TAR, Apache Parquet

  1. Input to Amperity: input_name_yyyy_MM_dd.parquet.tar.pgp

  2. After decryption: input_name_yyyy_MM_dd.parquet.tar

  3. After decompression: table_name_yyyy_MM_dd.parquet, where, for each table, 1 to n Apache Parquet files will be located within a single directory.

PGP, TGZ, CSV

  1. Input to Amperity: input_name_yyyy_MM_dd.csv.tgz.pgp

  2. After decryption: input_name_yyyy_MM_dd.csv.tgz

  3. After decompression: table_name.csv, where all tables that were input are located within a single directory.

Send files from Amperity

The following sections apply to all files that are sent from Amperity:

Connect to destination

Amperity requires the ability to connect, and then write data to the location in which files will be dropped. The credentials that allow Amperity to write data to that location are configured in Amperity. If this location is not managed by Amperity, the customer will need to provide these credentials to Amperity using SnapPass to complete the configuration.

SnapPass allows secrets to be shared in a secure, ephemeral way. Input a single or multi-line secret, along with an expiration time, and then generate a one-time use URL that may be shared with anyone. Amperity uses SnapPass for sharing credentials to systems with customers.

Send file formats

Amperity supports the following file formats when sending files from Amperity:

Campaign template patterns

Data templates that are made available to campaigns may use variables to apply campaign names, group names, and send dates to the names of campaigns that are sent from Amperity.

Note

A date is automatically appended to the filename for one-time campaigns.

Important

Campaign templates use the same tokens and Joda-Time filters as file-based templates.

Campaign names

Use the {{ campaign_name }} variable to define where the name of the campaign is added to the filename for a campaign, as it will be received by the downstream system.

Use this variable by itself to use the campaign name as the filename. For example, when the filename template is set to {{ campaign_name }} and the name of the campaign is acme_subscriber_bogo_20220815_1 the filename will be acme_subscriber_bogo_20220815_1.

You may use this variable by itself or with {{ group_name }} in any order.

Group names

Use the {{ group_name }} variable to define where the name of a treatment group is added to the filename for a campaign, as it will be received by the downstream system.

Use this variable by itself to use the treatment group name as the filename. For example, when the filename template is set to {{ group_name }} and the name of the treatment group is Group1_bogo_20220815_1 the filename will be Group1_bogo_20220815_1.

You may use this variable by itself or with {{ campaign_name }} in any order.

List names

Use the {{ list_name }} variable to use the name of the campaign as the filename for a campaign, as it will be received by the downstream system.

Filename template patterns

A filename template defines the naming pattern for files that are sent from Amperity. Specify the name of the file, and then use Jinja-style string formatting to append a date or timestamp to the filename.

Joda-Time is an open-source date and time library that is used by Amperity to establish consistency in filename patterns. The recommended pattern is “Segment_Name_MM-dd-YYYY”, where “Segment_Name” is the name of the segment and “MM-dd-YYYY” will append the current date.

Text variables

Strings in a filename template are literal by default. Use the {{ text }} variable to apply special rendering to the text value.

Filters

Use a filter to shift a timezone, and then format it as a string. The following filters are available:

Filter

Description

local

Use the local filter to shift a datetime to a given timezone using a string in Joda-Time format. For example: local:'America/Los_Angeles'.

format

Use the format filter to format a datetime with a string in Joda-Time format. For example: format:'MM-dd-yyyy'.

next day

Use the next day filter to shift the datetime to the next day. For example: now|local:'America/Los_Angeles'|next_day.

Tokens

Use a token to specify how to apply a datetime to a file. The following tokens are available:

Token

Description

now

Use the now token to apply a datetime to a file that is current at the time a file is written.

File compression

Amperity supports the following compression and archiving formats:

  • Tar

  • Tgz

  • Zip

  • GZip

When Tar or Zip options are not specified, a folder is created using the name filename template specified for the orchestration. This folder will contain one (or more) files, each of which have generated names.

Tip

Compression and archive file extensions are not added to the filename template automatically. These may be added while configuring an orchestration. To add the file compression format to the output filename, append .tar, .tgz, .zip, or .gz after the file format extension in the filename template. For example: parquet.tar, csv.zip, or tsv.gz.

Output examples

The following examples show how files output by Amperity are named depending on the various combination of options for file format, compression type, and encryption that are available. All examples use yyyy_MM_dd for the date format.

for queries

Apache Parquet, TAR, PGP

  1. Output from Amperity: query_name_yyyy_MM_dd.tar.pgp

  2. After decryption: query_name_yyyy_MM_dd.tar

  3. After decompression: query_name_yyyy_MM_dd.parquet, with 1 to n Apache Parquet part files within the directory.

CSV, TGZ, PGP

  1. Output from Amperity: query_name_yyyy_MM_dd.tgz.pgp

  2. After decryption: query_name_yyyy_MM_dd.tgz

  3. After decompression: query_name_yyyy_MM_dd.csv

CSV, GZip, PGP

  1. Output from Amperity: query_name_yyyy_MM_dd.csv.gz.pgp

  2. After decryption: query_name_yyyy_MM_dd.csv.gz

  3. After decompression: query_name_yyyy_MM_dd.csv

for single tables

Apache Parquet, TAR, PGP

  1. Output from Amperity: table_name_yyyy_MM_dd.tar.pgp

  2. After decryption: table_name_yyyy_MM_dd.tar

  3. After decompression: table_name_yyyy_MM_dd.parquet, with 1 to n Apache Parquet part files within the directory.

CSV, TGZ, PGP

  1. Output from Amperity: table_name_yyyy_MM_dd.tgz.pgp

  2. After decryption: table_name_yyyy_MM_dd.tgz

  3. After decompression: table_name_yyyy_MM_dd.csv

CSV, GZip, PGP

  1. Output from Amperity: table_name_yyyy_MM_dd.csv.gz.pgp

  2. After decryption: table_name_yyyy_MM_dd.csv.gz

  3. After decompression: table_name_yyyy_MM_dd.csv

for multiple tables

Apache Parquet, TAR, PGP

  1. Output from Amperity: output_name_yyyy_MM_dd.parquet.tar.pgp

  2. After decryption: output_name_yyyy_MM_dd.parquet.tar

  3. After decompression: table_name_yyyy_MM_dd.parquet, where, for each table, 1 to n Apache Parquet files will be located within a single directory.

CSV, TGZ, PGP

  1. Output from Amperity: output_name_yyyy_MM_dd.csv.tgz.pgp

  2. After decryption: output_name_yyyy_MM_dd.csv.tgz

  3. After decompression: table_name.csv, where all tables that were output are located within a single directory.

SFTP and encryption

You must use encryption when using Secure File Transfer Protocol (SFTP) to transfer files to and from Amperity. Encryption protects the information in the files and requires the use of secure protocols and encryption programs:

  • Secure Shell (SSH) is a secure protocol that protects your files as they are transferred to and from Amperity

  • Pretty Good Privacy (PGP) is an encryption program that signs, encrypts, and decrypts files, protecting your files while they are at rest

Secure Shell (SSH)

Secure Shell (SSH) is a secure protocol that protects your files while they are transferred to and from Amperity.

Important

Amperity prefers using RSA as the encryption format for generating SSH keys that are used by SFTP connections to pull data to or send data from Amperity. Amperity prefers the size of the SSH key to be 4096 bits.

Amperity also supports the ECDSA and EC25519 encryption formats.

Generate SSH keypairs

RSA is a cryptographic system that may be used to generate public and private key pairs for the purpose of securing data transmission to and from Amperity via SFTP. The public key is used to encrypt data. The private key is based on a very large prime number and is used to decrypt data.

To generate SSH keypairs

  1. Run the following command to generate an RSA key pair:

    $ ssh-keygen -t rsa -m PEM -f generated-key
    

    This will write two files: generated-key (the private key) and generated-key.pub (the public key).

  2. The location in which the public and private keys should be placed depends on the location to which data is transferred.

  3. Add the public key to the SFTP location from which data is sent to Amperity.

  4. Add the private key to the Amperity SFTP courier.

Pretty Good Privacy (PGP)

Pretty Good Privacy (PGP) is an encryption program that provides cryptographic privacy and authentication for data communication by signing, encrypting, and decrypting data files and formats. Amperity supports PGP encryption.

PGP helps protect your files while they are at rest. Amperity recommends to use:

  • 4096-bit keys

  • A strong passphrase

  • One PGP key per-tenant (minimum); one PGP key per system (recommended)

Files that are encrypted using PGP are appended with the .pgp extension.

Important

There are two types of PGP public keys: a primary key and a subkey. Amperity does not allow the use of a primary key for public-private key encryption. If you attempt to use a primary key you will see an error similar to “Destination failed validation: PGP public key is a primary key. Please provide a subkey or a keyring with exactly one subkey.”

Encrypt files

Any tool that is compliant with the OpenPGP standard, as defined by RFC 4880 may be used for PGP encryption. For example:

  • GNU Privacy Guard. Available from https://www.gnupg.org/. Instructions for how to use GNU Privacy Guard are from that site.

  • GPG Tools. Available from https://gpgtools.org/. Instructions for how to use GPG Tools are from that site.

Use PGP encryption with destinations

To use PGP encryption with data sources use the PGP credentials setting to select a PGP credential. For new keys, use the PGP credentials setting to assign the credential a name and description, a passphrase, and then the public key that is used to encrypt data.

Caution

Be sure to include the “BEGIN PGP PUBLIC KEY BLOCK” and “END PGP PUBLIC KEY BLOCK” header and footer in the key. Only users and systems with access to the private key will be able to decrypt this data. Use Snappass to share the public key.

Decrypt files

Any tool that is compliant with the OpenPGP standard, as defined by RFC4880 may be used for PGP decryption. For example:

  • GNU Privacy Guard. Available from https://www.gnupg.org/. Instructions for how to use GNU Privacy Guard are from that site.

  • GPG Tools. Available from https://gpgtools.org/. Instructions for how to use GPG Tools are from that site.

Add PGP decryption to data sources

To use PGP decryption with data sources use the PGP credentials setting to select a PGP credential. For new keys, use the PGP credentials setting to assign the credential a name and description, a passphrase, and then the private key that is used to decrypt data.

Note

Some data sources require you to switch to the legacy editor before you can configure PGP credentials. This link is at the top of the page when you are creating a courier and is named “Switch to legacy editor”.

Caution

Be sure to include the “BEGIN PGP PRIVATE KEY BLOCK” and “END PGP PRIVATE KEY BLOCK” header and footer in the key.

Anyone with access to the decryption key is capable of decrypting data that has been encrypted with the corresponding public key. Please keep both public and private keys confidential. Use Snappass to share the private key.

About data transfers

You must use encryption while transferring files to and from Amperity. SSH protects your files as they are transferred to and from Amperity. PGP protects your files while they are at rest.

The combination of public and private keys that are used for a specific workflow depends on if data is being pulled to Amperity from an upstream system or if data is being sent from Amperity to a downstream system.

Pull files to Amperity

When Amperity pulls files from upstream systems using SFTP, use the following combinations for SSH and PGP keys:

for SSH

  1. The owner of the upstream system will create the SSH keypair and will maintain the private SSH key.

  2. The public SSH key is configured in Amperity; you may send the public SSH key to your Amperity representative using SnapPass.

for PGP

  1. Amperity Support will create the PGP keypair and will maintain the private PGP key.

  2. Amperity Support will send you the public PGP key using SnapPass; the owner of the upstream system will encrypt files using the public PGP key prior to adding the files to the location from which Amperity will pull data.

    Tip

    Use file compression before encrypting files; compression applied after encryption will not reduce the size of the file.

  3. Amperity will use the private PGP key to decrypt files pulled from the upstream system.

Send files from Amperity

When Amperity sends files to downstream systems using SFTP, use the following combinations for SSH and PGP keys:

for SSH

  1. Amperity Support will create the SSH keypair and will maintain the private SSH key

  2. Amperity Support will send you the public SSH key using SnapPass; add the public SSH key to the downstream system to which Amperity is configured to send data

for PGP

  1. The owner of the downstream system creates the PGP keypair and maintains the private PGP key; you may send the public PGP key to your Amperity representative using SnapPass

  2. Amperity will use the public PGP key to encrypt files before sending them to the downstream system.

  3. The downstream system will use the private PGP key to decrypt files sent from Amperity

Key rotations

Amperity performs key rotations on a periodic basis as a best practice. Key rotations are sometimes necessary in situations where a key may have been compromised. When a key rotation happens, Amperity will:

  1. Generate a key pair

  2. Create a keyring file that contains the old and new private keys and uses the same passphrase

  3. Install the keyring file to the courier

  4. Share the new public key with the customer using SnapPass

  5. Wait for confirmation from the customer that the public key is updated

  6. Create a keyring file that contains only the updated private key

  7. Install the keyring file that contains only the updated private key to the courier