Pull from Dynamic Yield

Dynamic Yield helps companies build and test personalized, optimized, and synchronized digital customer experiences.

Note

This topic explains how to configure Amperity to pull data from a password-protected Amazon S3 bucket that is managed by Dynamic Yield. You must configure Dynamic Yield to pull data from an Amazon S3 bucket that is managed from Dynamic Yield.

This topic describes the steps that are required to pull product catalog data to Amperity from Dynamic Yield:

  1. Get details

  2. Configure cross-account roles

  3. Add data source and feed

Get details

Amperity can be configured to pull data from Dynamic Yield using Amazon S3. This requires the following configuration details:

The Dynamic Yield destination requires the following configuration details:

Detail one.

The name of the S3 bucket from which data will be pulled to Amperity.

Detail two.

For cross-account role assumption you will need the value for the Target Role ARN, which enables Amperity to access the customer-managed Amazon S3 bucket.

Note

The values for the Amperity Role ARN and the External ID fields are provided automatically.

Review the following sample policy, and then add a similar policy to the customer-managed Amazon S3 bucket that allows Amperity access to the bucket. Add this policy as a trusted policy to the IAM role that is used to manage access to the customer-managed Amazon S3 bucket.

The policy for the customer-managed Amazon S3 bucket is unique, but will be similar to:

{
  "Statement": [
    {
      "Sid": "AllowAmperityAccess",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::account:role/resource"
       },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
           "sts:ExternalId": "01234567890123456789"
        }
      }
    }
  ]
}

The value for the role ARN is similar to:

arn:aws:iam::123456789012:role/prod/amperity-plugin

An external ID is an alphanumeric string between 2-1224 characters (without spaces) and may include the following symbols: plus (+), equal (=), comma (,), period (.), at (@), colon (:), forward slash (/), and hyphen (-).

Detail one.

A list of objects (by filename and file type) in the S3 bucket to be sent to Amperity and a sample for each file to simplify feed creation.

The size of a CSV file cannot exceed 10 GB. A CSV file that is larger than 10 GB must be split into smaller files before it is made available to Amperity. The total number of CSV files in a single ingest job cannot exceed 500,000.

Configure cross-account roles

Amperity prefers to pull data from and send data to customer-managed cloud storage.

Amperity requires using cross-account role assumption to manage access to Amazon S3 to ensure that customer-managed security policies control access to data.

This approach ensures that customers can:

  • Directly manage the IAM policies that control access to data

  • Directly manage the files that are available within the Amazon S3 bucket

  • Modify access without requiring involvement by Amperity; access may be revoked at any time by either Amazon AWS account, after which data sharing ends immediately

  • Directly troubleshoot incomplete or missing files

Note

After setting up cross-account role assumption, a list of files (by filename and file type), along with any sample files, must be made available to allow for feed creation. These files may be placed directly into the shared location after cross-account role assumption is configured.

Can I use an Amazon AWS Access Point?

Yes, but with the following limitations:

  1. The direction of access is Amperity access files that are located in a customer-managed Amazon S3 bucket

  2. A credential-free role-to-role access pattern is used

  3. Traffic is not restricted to VPC-only

To configure an S3 bucket for cross-account role assumption

The following steps describe how to configure Amperity to use cross-account role assumption to pull data from (or push data to) a customer-managed Amazon S3 bucket.

Important

These steps require configuration changes to customer-managed Amazon AWS accounts and must be done by users with administrative access.

Step 1.

Open the Sources tab to configure credentials for Dynamic Yield.

Click the Add courier button to open the Add courier dialog box.

Add credentials for a data source.

Do one of the following to select Dynamic Yield:

  1. Click the row in which Dynamic Yield is located.

  2. Search for Dynamic Yield. Start typing “ama”. The list will filter to show only matching sources.

Step 1.

From the Credentials dialog box, enter a name for the credential, select the iam-role-to-role credential type, and then select “Create new credential”.

Select the iam-role-to-role credential type.
Step 2.

Next configure the settings that are specific to cross-account role assumption.

Name, description, choose plugin.

The values for the Amperity Role ARN and External ID fields – the Amazon Resource Name (ARN) for your Amperity tenant and its external ID – are provided automatically.

You must provide the values for the Target Role ARN and S3 Bucket Name fields. Enter the target role ARN for the IAM role that Amperity will use to access the customer-managed Amazon S3 bucket, and then enter the name of the Amazon S3 bucket.

Step 3.

Review the following sample policy, and then add a similar policy to the customer-managed Amazon S3 bucket that allows Amperity access to the bucket. Add this policy as a trusted policy to the IAM role that is used to manage access to the customer-managed Amazon S3 bucket.

The policy for the customer-managed Amazon S3 bucket is unique, but will be similar to:

{
  "Statement": [
    {
      "Sid": "AllowAmperityAccess",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::account:role/resource"
       },
      "Action": "sts:AssumeRole",
      "Condition": {
        "StringEquals": {
           "sts:ExternalId": "01234567890123456789"
        }
      }
    }
  ]
}

The value for the role ARN is similar to:

arn:aws:iam::123456789012:role/prod/amperity-plugin

An external ID is an alphanumeric string between 2-1224 characters (without spaces) and may include the following symbols: plus (+), equal (=), comma (,), period (.), at (@), colon (:), forward slash (/), and hyphen (-).

Step 4.

Click Continue to test the configuration (and validate the connection) to the customer-managed Amazon S3 bucket, after which you will be able to continue the steps for adding a courier.

Add data source and feed

Add a data source that pulls data from an Dynamic Yield bucket for each file that you want to pull to Amperity.

Browse the Dynamic Yield bucket to select a file, and then review the settings for that file. Define the feed schema, and then activate the feed. Run the courier manually, and then review the data that is added to the domain table that is associated with the feed.

To add a data source for an Amazon S3 bucket

Step 1.

Open the Sources page to configure Dynamic Yield.

Click the Add courier button to open the Add courier dialog box.

Add

Select Dynamic Yield. Do one of the following:

  1. Click the row in which Dynamic Yield is located. Sources are listed alphabetically.

  2. Search for Dynamic Yield. Start typing “ama”. The list will filter to show only matching sources.

Step 2.

Credentials allow Amperity to connect to Dynamic Yield and must exist before a courier can be configured to pull data from Dynamic Yield. Select an existing credential from the Credential dropdown, and then click Continue.

Tip

A courier that has credentials that are configured correctly will show a “Connection successful” status, similar to:

Add
Step 3.

Select the file that will be pulled to Amperity, either directly (by going into the Amazon S3 bucket and selecting it) or by providing a filename pattern.

Add

Click Browse to open the File browser. Select the file that will be pulled to Amperity, and then click Accept.

Use a filename pattern to define files that will be loaded on a recurring basis, but will have small changes to the filename over time, such as having a datestamp appended to the filename.

Note

For a new feed, this file is also used as the sample file that is used to define the schema. For an existing feed, this file must match the schema that has already been defined.

Add

Use the PGP credential setting to specify the credentials to use for an encrypted file.

Add
Step 4.

Review the file.

Add

The contents of the file may be previewed as a table and in a raw format. Switch between these views using the Table and Raw buttons, and then click Refresh to view the file in that format.

Note

PGP encrypted files can be previewed. Apache Parquet PGP encrypted files must be less than 500 MB to be previewed.

Amperity will infer formatting details, and then adds these details to a series of settings located along the left side of the file view. File settings include:

  • Delimiter

  • Compression

  • Escape character

  • Quote character

  • Header row

Review the file, and then update these settings, if necessary.

Note

Amperity supports the following file types: Apache Avro, Apache Parquet, CSV, DSV, JSON, NDJSON, PSV, TSV, and XML.

Refer to those reference pages for details about each of the individual file formats.

Files that contain nested JSON (or “complex JSON”) or XML may require using the legacy courier configuration.

Step 5.

A feed defines the schema for a file that is loaded to Amperity, after which that data is loaded into a domain table and ready for use with workflows within Amperity.

There are two options for feeds: use a new feed or use an existing feed.

Use a new feed

To use a new feed, choose the Create new feed option, select an existing source from the Source dropdown or type the name of a new data source, and then enter the name of the feed.

Add

After you choose a load type and save the courier configuration, you will configure the feed using the data within the sample file.

Use an existing feed

To use an existing feed, choose the Use existing feed option to use an existing schema.

Add

This option requires this file to match all of the feed-specific settings, such as incoming field names, field types, and primary keys. The data within the file may be different.

Load types

The load type defines how data in the file will be loaded to the associated domain table.

Add

Use the Truncate and load option to delete all rows in the associated domain table prior to loading data.

Use the Load option to load data from the selected file to the associated domain table.

Note

When a file is loaded to a domain table using an existing file, the file that is loaded must have the same schema as the existing feed. The data in the file may be new.

Step 6.

Use the feed editor to do all of the following:

  • Set the primary key

  • Choose the field that best presents when the data in the table was last updated; if there is not an obvious choice, use the “Generate an updated field” option.

  • For each field in the incoming data, validate the field name and semantic tag columns in the feed. Make any necessary adjustments.

  • For tables that contain customer records, enable the “Make available to Stitch” to ensure the values in this data source are used for identity resolution.

When finished, click Activate.

Step 7.

Find the courier related to the feed that was just activated, and then run it manually.

On the Sources page, under Couriers, find the courier you want to run and then select Run from the actions menu.

Add

Select a date from the calendar picker that is before today, but after the date on which the file was added to the Dynamic Yield bucket.

Add

Leave the load options in the Run courier dialog box unselected, and then click Run.

After the courier has run successfully, inspect the domain table that contains the data that was loaded to Amperity. After you have verified that the data is correct, you may do any of the following:

  • If the data contains customer records, edit the feed and make that data available to Stitch.

  • If the data should be loaded to Amperity on a regular basis, add the courier to a courier group that runs on the desired schedule.

  • If the data will be a foundation for custom domain tables, use Spark SQL to build out that customization.