Pull from Optimizely¶
Optimizely is an experimentation platform for testing, learning, and deploying positive digital experiences.
Optimizely can send enriched exports events data to Amperity via Amazon S3. Enriched events include details such as event timestamps, event IDs, event tags, event names, visitor IDs, session IDs, experiment IDs, and variation IDs.
This topic describes the steps that are required to pull interactions records to Amperity from Optimizely:
Get details¶
Amperity can be configured to pull data from Optimizely using Amazon S3. This requires the following configuration details:
Access to an Optimizely data service hosted in Amazon S3.
The Amazon Resource Name (ARN) for a role with cross-account access.
The name of the Amazon S3 bucket.
A list of objects (by filename and file type) in the Amazon S3 bucket to be pulled to Amperity.
A sample for each file to simplify feed creation.
Note
Amperity supports using cross-account role assumption with Amazon S3 buckets when Optimizely supports the use of cross-account roles and your tenant uses the Amazon S3 data source.
Amazon S3 requirements¶
Amazon S3 requires the following:
Credentials that allow Amperity to access, and then read data from the Amazon S3 bucket used by Optimizely.
Files provided in Apache Parquet format and using the YYYY-MM-DD date format.
Files sent from Optimizely are located in partitions, one for decisions and one for events.
Optimizely uses AWS Key Management Service for encryption.
Amazon S3 credentials¶
Amperity requires the ability to connect to, and then read data from the Amazon S3 bucket used by Optimizely. The credentials that allow that connection and the permissions to read data are entered into the Amperity user interface while configuring a courier. These credentials are created and managed by the owner of the Amazon S3 bucket. Use SnapPass to share credentials with your Amperity representative, if necessary.
SnapPass allows secrets to be shared in a secure, ephemeral way. Input a single or multi-line secret, along with an expiration time, and then generate a one-time use URL that may be shared with anyone. Amperity uses SnapPass for sharing credentials to systems with customers.
Optimizely S3 partitions¶
Enriched events exports are exported to a bucket named optimizely-events-data that contains two partitions: decisions and conversions.
The paths to these partitions are similar to:
s3://optimizely-events-data/v1/account_id=<account_id>/
type=decisions/date={YYYY-MM-DD}/experiment=<experiment_id>
or
s3://optimizely-events-data/v1/account_id=<account_id>/
type=events/date={YYYY-MM-DD}/event=<event_name>
where:
optimizely-events-data is the name of the Amazon S3 bucket
account_id is your unique account identifier
date is the creation date for the data
experiment_id is the unique experiment identifier used for the decisions partition
event_name is the event or entity identifier used for the events partition
The daily partition files are ready when _SUCCESS is appended to the partition path.
Note
Optimizely uses AWS Key Management Service for encryption. Amperity must be able to decrypt these files to pull them to the Amazon S3 or Azure Blob Storage location used by your tenant.
Add courier¶
A courier brings data from an external system to Amperity.
Tip
You can run a courier with an empty load operation using {}
as the value for the load operation. Use this approach to get files to upload during feed creation, as a feed requires knowing the schema of a file before you can apply semantic tagging and other feed configuration settings.
To add a courier
From the Sources page, click Add Courier. The Add Source page opens.
Find, and then click the icon for Amazon S3. The Add Courier page opens.
This automatically selects iam-credential as the Credential Type.
From the Credential drop-down, select Create a new credential. This opens the Create New Credential dialog box.
Enter a name for the credential and add the configuration settings. Click Save.
Under S3 Settings, add the name of the Optimizely bucket, prefix, and region.
Under S3 Settings configure the list of files to pull to Amperity. Configure the Entities List for each file to be loaded to Amperity. For example, two files: “decisions.parquet” and “conversions.parquet”.
[ { "object/type": "file", "object/file-pattern": "'/path/to/decisions_YYYY-MM-DD.parquet'", "object/land-as": { "file/tag": "decisions", "file/content-type": "application/x-parquet" } }, { "object/type": "file", "object/file-pattern": "'/path/to/conversions_YYYY-MM-DD.parquet'", "object/land-as": { "file/tag": "conversions", "file/content-type": "application/x-parquet" } } ]
Note
The file pattern to the location at which the Optimizely files are located in Amazon S3 may have a complex directory structure that uses numerals, versions, years, months, days, and compression. For example:
'1234567890/0987654321/2.0/'yyyy'/'MM'/'dd'/*/*.gz'
Under Optimizely Settings set the load operations to a string that is obviously incorrect, such as df-xxxxxx. (You may also set the load operation to empty: “{}”.)
Tip
If you use an obviously incorrect string, the load operation settings will be saved in the courier configuration. After the schema for the feed is defined and the feed is activated, you can edit the courier and replace the feed ID with the correct identifier.
Caution
If load operations are not set to “{}” or are not set to an obviously incorrect string the validation test for the courier configuration settings will fail.
Click Save.
Get sample files¶
Every Optimizely file that is pulled to Amperity must be configured as a feed. Before you can configure each feed you need to know the schema of that file. Run the courier without load operations to bring sample files from Optimizely to Amperity, and then use each of those files to configure a feed.
To get sample files
From the Sources tab, open the menu for a courier configured for Optimizely with empty load operations, and then select Run. The Run Courier dialog box opens.
Select Load data from a specific day, and then select today’s date.
Click Run.
Important
The courier run will fail, but this process will successfully return a list of files from Optimizely.
These files will be available for selection as an existing source from the Add Feed dialog box.
Wait for the notification for this courier run to return an error similar to:
Error running load-operations task Cannot find required feeds: "df-xxxxxx"
Add feeds¶
A feed defines how data should be loaded into a domain table, including specifying which columns are required and which columns should be associated with a semantic tag that indicates that column contains customer profile (PII) and transactions data.
Note
Decision and conversion events have different schemas and each will require their own feed.
To add a feed
From the Sources tab, click Add Feed. This opens the Add Feed dialog box.
Under Data Source, select Create new source, and then enter “Optimizely”.
Enter the name of the feed in Feed Name. For example: “Decisions”.
Tip
The name of the domain table will be “<data-source-name>:<feed-name>”. For example: “Optimizely:Decisions”.
Under Sample File, select Select existing file, and then choose from the list of files. For example: “filename_YYYY-MM-DD.csv”.
Tip
The list of files that is available from this drop-down menu is sorted from newest to oldest.
Select Load sample file on feed activation.
Click Continue. This opens the Feed Editor page.
Select the primary key.
Apply semantic tags to customer records and interaction records, as appropriate.
Under Last updated field, specify which field best describes when records in the table were last updated.
Tip
Choose Generate an “updated” field to have Amperity generate this field. This is the recommended option unless there is a field already in the table that reliably provides this data.
For feeds with customer records (PII data), select Make available to Stitch.
Click Activate. Wait for the feed to finish loading data to the domain table, and then review the sample data for that domain table from the Data Explorer.
Add load operations¶
After the feeds are activated and domain tables are available, add the load operations to the courier used for Optimizely.
Example load operations
Load operations must specify each file that will be pulled to Amperity from Optimizely.
For example:
{
"DECISIONS-FEED-ID": [
{
"type": "truncate"
},
{
"type": "load",
"file": "decisions"
}
],
"CONVERSIONS-FEED-ID": [
{
"type": "load",
"file": "conversions"
}
]
}
To add load operations
From the Sources tab, open the menu for the courier that was configured for Optimizely, and then select Edit. The Edit Courier dialog box opens.
Edit the load operations for each of the feeds that were configured for Optimizely so they have the correct feed ID.
Click Save.
Run courier manually¶
Run the courier again. This time, because the load operations are present and the feeds are configured, the courier will pull data from Optimizely.
To run the courier manually
From the Sources tab, open the menu for the courier with updated load operations that is configured for Optimizely, and then select Run. The Run Courier dialog box opens.
Select the load option, either for a specific time period or all available data. Actual data will be loaded to a domain table because the feed is configured.
Click Run.
This time the notification will return a message similar to:
Completed in 5 minutes 12 seconds
Add to courier group¶
A courier group is a list of one (or more) couriers that are run as a group, either ad hoc or as part of an automated schedule. A courier group can be configured to act as a constraint on downstream workflows.
To add the courier to a courier group
From the Sources tab, click Add Courier Group. This opens the Create Courier Group dialog box.
Enter the name of the courier. For example: “Optimizely”.
Add a cron string to the Schedule field to define a schedule for the orchestration group.
A schedule defines the frequency at which a courier group runs. All couriers in the same courier group run as a unit and all tasks must complete before a downstream process can be started. The schedule is defined using cron.
Cron syntax specifies the fixed time, date, or interval at which cron will run. Each line represents a job, and is defined like this:
┌───────── minute (0 - 59) │ ┌─────────── hour (0 - 23) │ │ ┌───────────── day of the month (1 - 31) │ │ │ ┌────────────── month (1 - 12) │ │ │ │ ┌─────────────── day of the week (0 - 6) (Sunday to Saturday) │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ * * * * * command to execute
For example,
30 8 * * *
represents “run at 8:30 AM every day” and30 8 * * 0
represents “run at 8:30 AM every Sunday”. Amperity validates your cron syntax and shows you the results. You may also use crontab guru to validate cron syntax.Set Status to Enabled.
Specify a time zone.
A courier group schedule is associated with a time zone. The time zone determines the point at which a courier group’s scheduled start time begins. A time zone should be aligned with the time zone of system from which the data is being pulled.
Use the Use this time zone for file date ranges checkbox to use the selected time zone to look for files. If unchecked, the courier group will use the current time in UTC to look for files to pick up.
Note
The time zone that is chosen for an courier group schedule should consider every downstream business processes that requires the data and also the time zone(s) in which the consumers of that data will operate.
Add at least one courier to the courier group. Select the name of the courier from the Courier drop-down. Click + Add Courier to add more couriers.
Click Add a courier group constraint, and then select a courier group from the drop-down list.
A wait time is a constraint placed on a courier group that defines an extended time window for data to be made available at the source location.
Important
A wait time is not required for a bridge.
A courier group typically runs on an automated schedule that expects customer data to be available at the source location within a defined time window. However, in some cases, the customer data may be delayed and isn’t made available within that time window.
For each courier group constraint, apply any offsets.
A courier can be configured to look for files within range of time that is older than the scheduled time. The scheduled time is in Coordinated Universal Time (UTC), unless the “Use this time zone for file date ranges” checkbox is enabled for the courier group.
This range is typically 24 hours, but may be configured for longer ranges. For example, it’s possible for a data file to be generated with a correct file name and datestamp appended to it, but for that datestamp to represent the previous day because of how an upstream workflow is configured. A wait time helps ensure that the data at the source location is recognized correctly by the courier.
Warning
This range of time may affect couriers in a courier group whether or not they run on a schedule. A manually run courier group may not take its schedule into consideration when determining the date range; only the provided input day(s) to load data from are used as inputs.
Click Save.