Pull from Adobe Analytics

Adobe Analytics provides useful intelligence about customer activity on Web sites and mobile devices. Marketers can analyze clickstream data to understand what ther customers are doing in real-time, and then optimize customer experiences across brands.

This topic describes the steps that are required to pull raw clickstream data to Amperity from Adobe Analytics:

  1. Get details

  2. Review clickstream files

  3. Add courier

  4. Get sample files

  5. Add feeds

  6. Add load operations

  7. Run courier

  8. Add to courier group

Get details

Adobe Analytics may be configured to send data to Amperity using SFTP, Amazon S3, or Azure Blob Storage as a staging step, from which Amperity is configured to pull data. This requires the following configuration details:

  1. The RSA public key to use for PGP encryption.

    This key must be downloaded from the Adobe Analytics console, and then sent to Amperity using SnapPass. A representative of Amperity will add the certificate to the SFTP location that is built into Amperity (<tenant>.sftp.amperity.com).

    Tip

    Amperity provides a built-in SFTP connector for Adobe Analytics with some pre-configured settings.

    You may configure Adobe Analytics to send data to Amazon S3 or Azure Blob Storage, after which you would use that data source to configure your connection to Adobe Analytics. The connection steps will change (from SFTP to Amazon S3 or Azure Blob Storage) and are outlined in those topics, but all other steps are the same as outlined in this topic.

  2. From the Adobe Anaytics admin console, configure an Adobe Analytics Data Feed . Specify the connection type as SFTP, port 22, the host name (<tenant>.sftp.amperity.com), and then the folder path to which that data is sent. For example: /tenant/.

  3. From the Adobe Analytics console, configure the contents of the data feed to contain a limited set of fields. (Clickstream data can contain hundreds or even thousands of fields. Many of these are not useful for workflows within Amperity).

    Trim the list of fields that are sent to Amperity.

    Ensure that an authentication key is present in the data, such as an ID login or cookie, that links to the internal ID system to ensure that data associated with customers is usable.

    Select only relevant events, as they relate to the authentication key, such as logins, purchases, and so on.

    Additional data attributes for product IDs, SKUs, categories, content types, and so on.

    Provide to Amperity a dictionary of configurable evar fields that are in use.

  4. A sample for each file to simplify feed creation.

    Note

    Files sent from Adobe Analytics use Gzip as the compression format and will contain multiple files.

Tip

Use SnapPass to securely share configuration details for Adobe Analytics between your company and your Amperity representative.

Clickstream files

Adobe Analytics can send data to the SFTP location that is built into Amperity. Adobe Analytics must be able to connect to this location, and then add files to the specified path.

When configured to run on a schedule, the output from Adobe Analytics is a compressed Gzip that contains multiple files. One of these files (hit_data.tsv) is the primary table and should be configured to run on a daily basis. All of the other files are static lookup tables for codes in the primary table.

Add courier

A courier brings data from external system to Amperity. A courier relies on a feed to know which fileset to bring to Amperity for processing.

Tip

You can run a courier without load operations. Use this approach to get files to upload during feed creation, as a feed requires knowing the schema of a file before you can apply semantic tagging and other feed configuration settings.

Example entities list

An entites list defines the list of files to be pulled to Amperity, along with any file-specific details (such as file name, file type, if header rows are required, and so on).

For example:

[
  {
    "archive/contents": {
      "hit_data.tsv": {
        "subobject/land-as": {
          "file/tag": "adobe_clickstream",
          "file/content-type": "text/tsv"
        }
      }
    },
    "object/type": "archive",
    "object/file-pattern": "'tenant/filename_'yyyy-MM-dd'.zip'"
  }
]

Note

You may configure files as required ("object/optional": false) or optional ("object/optional": true.) A courier will fail if a required file is not available or, if all files in the fileset are optional, at least one of those files is not available.

To add a courier

  1. From the Sources tab, click Add Courier. The Add Source page opens.

  2. Find, and then click the icon for SFTP. The Add Courier page opens.

    This automatically selects sftp as the Credential Type and assigns <tenant>.sftp.amperity.com as the location from which data is pulled.

  3. Enter the name of the courier. For example: “Adobe Analytics”.

  4. From the Credential drop-down, select Create a new credential. This opens the Create New Credential page.

  5. From the Credential drop-down, select Create a new credential. This opens the Create New Credential dialog box. Enter a name for the credential (typically “Adobe Analytics”), and then enter the username and password required to access this location.

  6. Under Settings configure the list of files to pull to Amperity. Configure the Entities List for each file to be loaded to Amperity.

    Note

    If the file is contained within a ZIP archive, you may need to specify the fully qualified filename within the ZIP archive. For example, to import a file named “items.csv” you may need to specify “exportitems.csv”.

  7. Under Settings set the load operations to a string that is obviously incorrect, such as df-xxxxxx. (You may also set the load operation to empty: {}.)

    Tip

    If you use an obviously incorrect string, the load operation settings will be saved in the courier configuration. After the feed is configured and activated you can edit the courier, and then update the feed ID with the correct identifier.

    Caution

    If load operations are not set to {} the validation test for the courier configuration settings will fail.

  8. Click Save.

Get sample files

Every Adobe Analytics file that is pulled to Amperity must be configured as a feed. Before you can configure each feed you need to know the schema of that file. Run the courier without load operations to bring sample files from Adobe Analytics to Amperity, and then use each of those files to configure a feed.

The contents of the Gzip file from Adobe Analytics includes a primary table named hit_data.tsv, a metadata headers table, and a series of lookup tables that are focused on specific types of clickstream data, such as browsers, browser types, countries, operating systems, referer types, and search engines.

Important

The hit_data.tsv file contains the daily changes of clickstream data. All other files are static lookup tables that should not require updates.

To get sample files

  1. From the Sources tab, open the menu for a courier configured for Adobe Analytics with empty load operations, and then select Run. The Run Courier dialog box opens.

  2. Select Load data from a specific day, and then select today’s date.

  3. Click Run.

    Important

    The courier run will fail, but this process will successfully return a list of files from Adobe Analytics.

    These files will be available for selection as an existing source from the Add Feed dialog box.

  4. Wait for the notification for this courier run to return an error similar to:

    Error running load-operations task
    Cannot find required feeds: "df-xxxxxx"
    

Add feeds

A feed defines how data should be loaded into a domain table, including specifying which columns are required and which columns should be associated with a semantic tag that indicates that column contains customer profile (PII) and transactions data.

Configure a feed for each file located in the Gzip file sent from Adobe Analytics, with the exception of the metadata headers file. Use the filename as the name of the feed. The hit_data.tsv file contains the daily clickstream data. Consider naming the feed Clickstream, as it is the file that contains the daily updates for clickstream data.

Important

Clickstream data from Adobe Analytics contains standard fields , and then up to 250 conversion variables (evar1-evar250) .

Conversion variables are customer-specific and represent events that identify:

  • The customer, such as IDs or PII

  • Customer interactions

  • Purchases, transactions, and prices

  • Marketing campaign IDs that tie the customer to marketing efforts

  • Behaviors that may be useful to better understand the customer

Use the Rename To column to specify column names for each conversion variable. For example:

Incoming field

Rename to

_c0

campaign

_c1

channel

_c20

evar17

To add a feed

  1. From the Sources tab, click Add Feed. This opens the Add Feed dialog box.

  2. Under Data Source, select Create new source, and then enter “Adobe Analytics”.

  3. Enter the name of the feed in Feed Name. For example: “Clickstream”.

    Tip

    The name of the domain table will be “<data-source-name>:<feed-name>”. For example: “Adobe Analytics:Clickstream”.

  4. Under Sample File, select Select existing file, and then choose from the list of files. For example: “filename_YYYY-MM-DD.csv”.

    Tip

    The list of files that is available from this drop-down menu is sorted from newest to oldest.

  5. Select Load sample file on feed activation.

  6. Click Continue. This opens the Feed Editor page.

  7. Select the primary key.

  8. Apply semantic tags to customer records and interaction records, as appropriate.

  9. Under Last updated field, specify which field best describes when records in the table were last updated.

    Tip

    Choose Generate an “updated” field to have Amperity generate this field. This is the recommended option unless there is a field already in the table that reliably provides this data.

  10. For feeds with customer records (PII data), select Make available to Stitch.

  11. Click Activate. Wait for the feed to finish loading data to the domain table, and then review the sample data for that domain table from the Data Explorer.

Add load operations

After the feeds are activated and domain tables are available, add the load operations to the courier used for Adobe Analytics.

Example load operations

Load operations must specify each file that will be pulled to Amperity from Adobe Analytics.

For example:

{
  "CLICKSTREAM-FEED-ID": [
    {
      "type": "load",
      "file": "adobe_clickstream"
    }
  ]
}

To add load operations

  1. From the Sources tab, open the menu for the courier that was configured for Adobe Analytics, and then select Edit. The Edit Courier dialog box opens.

  2. Edit the load operations for each of the feeds that were configured for Adobe Analytics so they have the correct feed ID.

  3. Click Save.

Run courier manually

Run the courier again. This time, because the load operations are present and the feeds are configured, the courier will pull data from Adobe Analytics.

To run the courier manually

  1. From the Sources tab, open the    menu for the courier with updated load operations that is configured for Adobe Analytics, and then select Run. The Run Courier dialog box opens.

  2. Select the load option, either for a specific time period or all available data. Actual data will be loaded to a domain table because the feed is configured.

  3. Click Run.

    This time the notification will return a message similar to:

    Completed in 5 minutes 12 seconds
    

Add to courier group

A courier group is a list of one (or more) couriers that are run as a group, either ad hoc or as part of an automated schedule. A courier group can be configured to act as a constraint on downstream workflows.

Important

Only the feed associated with the hit_data.csv file should be configured to run on a daily basis. All other feeds related to Adobe Analytics clickstream data should be configured to run only one time because they contain only static lookup data.

To add the courier to a courier group

  1. From the Sources tab, click Add Courier Group. This opens the Create Courier Group dialog box.

  2. Enter the name of the courier. For example: “Adobe Analytics”.

  3. Add a cron string to the Schedule field to define a schedule for the orchestration group.

    A schedule defines the frequency at which a courier group runs. All couriers in the same courier group run as a unit and all tasks must complete before a downstream process can be started. The schedule is defined using cron.

    Cron syntax specifies the fixed time, date, or interval at which cron will run. Each line represents a job, and is defined like this:

    ┌───────── minute (0 - 59)
    │ ┌─────────── hour (0 - 23)
    │ │ ┌───────────── day of the month (1 - 31)
    │ │ │ ┌────────────── month (1 - 12)
    │ │ │ │ ┌─────────────── day of the week (0 - 6) (Sunday to Saturday)
    │ │ │ │ │
    │ │ │ │ │
    │ │ │ │ │
    * * * * * command to execute
    

    For example, 30 8 * * * represents “run at 8:30 AM every day” and 30 8 * * 0 represents “run at 8:30 AM every Sunday”. Amperity validates your cron syntax and shows you the results. You may also use crontab guru to validate cron syntax.

  4. Set Status to Enabled.

  5. Specify a time zone.

    A courier group schedule is associated with a time zone. The time zone determines the point at which a courier group’s scheduled start time begins. A time zone should be aligned with the time zone of system from which the data is being pulled.

    Note

    The time zone that is chosen for an courier group schedule should consider every downstream business processes that requires the data and also the time zone(s) in which the consumers of that data will operate.

  6. Set SLA? to False. (You can change this later after you have verified the end-to-end workflows.)

  7. Add at least one courier to the courier group. Select the name of the courier from the Courier drop-down. Click + Add Courier to add more couriers.

  8. Click Add a courier group constraint, and then select a courier group from the drop-down list.

    A wait time is a constraint placed on a courier group that defines an extended time window for data to be made available at the source location.

    A courier group typically runs on an automated schedule that expects customer data to be available at the source location within a defined time window. However, in some cases, the customer data may be delayed and isn’t made available within that time window.

  9. For each courier group constraint, apply any offsets.

    An offset is a constraint placed on a courier group that defines a range of time that is older than the scheduled time, within which a courier group will accept customer data as valid for the current job. Offset times are in UTC.

    A courier group offset is typically set to be 24 hours. For example, it’s possible for customer data to be generated with a correct file name and datestamp appended to it, but for that datestamp to represent the previous day because of the customer’s own workflow. An offset ensures that the data at the source location is recognized by the courier as the correct data source.

    Warning

    An offset affects couriers in a courier group whether or not they run on a schedule.

  10. Click Save.