Automate workflows

A workflow represents a series of steps that occur within Amperity that are related to a specific process. For example, many workflows are end-to-end process that:

  1. Uses a courier to pull data to Amperity.

  2. Standardizes that data using semantic tags and feeds.

  3. Adds that data to a domain table.

  4. Builds that data to a database.

  5. Runs a query to return results for a downstream workflow.

  6. Sends those results to the configured destination.

A workflow can have both upstream and downstream dependencies and can start from many locations within Amperity.

Define courier groups

A courier group is a list of one (or more) couriers that are run as a group, either ad hoc or as part of an automated schedule. A courier group can be configured to act as a constraint on downstream workflows.

A courier group is typically configured to run automatically on a recurring schedule. Because all couriers within a courier groups run as a unit, all of their dependent tasks must complete before any downstream processes, such as Stitch or database generation, can be started.

What a courier group does:

  1. Logically organizes a list of couriers into a group that shares the same schedule and workflow.

  2. Allows for each courier to be assigned schedule variance via wait times and offsets.

  3. Enables both automatic and ad hoc runs of couriers.

  4. Polls each data source associated with a courier in the group to determine if data is ready to be pulled to Amperity.

  5. Ensures that constraints for downstream processes are present in the workflow; all couriers in a courier group must complete their jobs.

  6. Enables a workflow to be assigned SLA status.

What a courier group needs:

  1. One (or more) couriers.

  2. A schedule.

  3. Configuration for wait times and offsets to help ensure that all files assigned to the courier group have a time window that is large enough to complete data collection.

The Sources tab shows the status of all courier groups, including when they last ran or updated, and its current status.

Add courier group

Use the Add Courier Group button to add a courier group to Amperity. A courier group should be created to consolidate individual couriers into a scheduled workflow that can be run under the Amperity SLA.

For each courier added to a courier group, define a wait time and an offset. This is used to help determine how much time the courier group should wait for the files associated with a courier to be ready for processing.

In some cases, if the files are not ready, the courier (and courier group) will fail. But in other cases, if the files in the courier are not flagged as required, the courier group may continue processing the rest of the files.

To add a courier group

  1. From the Sources tab, click Add Courier Group. This opens the Create Courier Group dialog box.

  2. Add the name for the courier group.

  3. Add a cron string to the Schedule field to define a schedule for the courier group.

    A schedule defines the frequency at which a courier group runs. All couriers in the same courier group run as a unit and all tasks must complete before a downstream process can be started. The schedule is defined using cron.

    Cron syntax specifies the fixed time, date, or interval at which cron will run. Each line represents a job, and is defined like this:

    ┌───────── minute (0 - 59)
    │ ┌─────────── hour (0 - 23)
    │ │ ┌───────────── day of the month (1 - 31)
    │ │ │ ┌────────────── month (1 - 12)
    │ │ │ │ ┌─────────────── day of the week (0 - 6) (Sunday to Saturday)
    │ │ │ │ │
    │ │ │ │ │
    │ │ │ │ │
    * * * * * command to execute
    

    For example, 30 8 * * * represents “run at 8:30 AM every day” and 30 8 * * 0 represents “run at 8:30 AM every Sunday”. Amperity validates your cron syntax and shows you the results. You may also use crontab guru to validate cron syntax.

    Tip

    Daylight savings time can affect a schedule. Be sure to set the schedule to be stable and not require changes over time. For example: if a schedule is set to 12:30 AM, and then you fall back, the schedule may become 11:30 PM (fall back) or 1:30 AM (spring forward).

  4. Set Status to ENABLED.

  5. Specify a time zone.

    A courier group schedule is associated with a time zone. The time zone determines the point at which an courier group’s scheduled start time begins. A time zone should be aligned with the time zone of system from which the data is being pulled.

    Note

    The time zone that is chosen for an courier group schedule should consider every downstream business processes that requires the data and also the time zone(s) in which the consumers of that data will operate.

  6. Click Add a courier group constraint, and then select a courier group from the drop-down list. Do this for each courier to be added to the courier group.

  7. Specify the wait time.

    A wait time is a constraint placed on a courier group that defines an extended time window for data to be made available at the source location. A courier group typically runs on an automated schedule that expects customer data to be available at the source location within a defined time window. However, in some cases, the customer data may be delayed and isn’t made available within that time window.

    Use a wait time to extend the time window for data to be made available. This can help reduce the number of SLA alerts that may be generated for data sources that cannot be picked up by a courier group.

  8. Specify the offset.

    An offset is a constraint placed on a courier group that defines a range of time that is older than the scheduled time, within which a courier group will accept customer data as valid for the current job.

    A courier group offset is typically set to be 24 hours. For example, it’s possible for customer data to be generated with a correct file name and datestamp appended to it, but for that datestamp to represent the previous day because of the customer’s own workflow. An offset ensures that the data at the source location is recognized by the courier as the correct data source.

    Warning

    An offset affects couriers in a courier group whether or not they run on a schedule.

  9. Click Save.

Apply constraints

Constraints are applied to each courier within a courier group. Each constraint is a combination of an offset (the amount of time for which a courier group will accept data) and a wait time (an amount of time that extends the defined time window).

Specify wait times

A wait time is a constraint placed on a courier group that defines an extended time window for data to be made available at the source location. A courier group typically runs on an automated schedule that expects customer data to be available at the source location within a defined time window. However, in some cases, the customer data may be delayed and isn’t made available within that time window.

Use a wait time to extend the time window for data to be made available. This can help reduce the number of SLA alerts that may be generated for data sources that cannot be picked up by a courier group.

Note

For couriers associated with a filedrop location the default wait time is 0. A polling operation only checks for a data source before declaring success or failure. For couriers associated with REST APIs and data warehouses, the polling operation is always considered to be successful.

Define offsets

An offset is a constraint placed on a courier group that defines a range of time that is older than the scheduled time, within which a courier group will accept customer data as valid for the current job.

A courier group offset is typically set to be 24 hours. For example, it’s possible for customer data to be generated with a correct file name and datestamp appended to it, but for that datestamp to represent the previous day because of the customer’s own workflow. An offset ensures that the data at the source location is recognized by the courier as the correct data source.

Warning

An offset affects couriers in a courier group whether or not they run on a schedule.

Each data source has a different assumption of what data is ready to load when, and each customer may have a different ask for the age of the data they expect to see in the system each day. An offset subtracts some amount of time from the global range before executing an individual courier, allowing for flexibility in a schedule.

Imagine a courier group with a heterogenous set of couriers and the following constraints:

  1. The courier group is run every day at 13:00 (UTC).

  2. Courier #1 is an Amazon S3 courier where the incoming files each day are labeled with yesterday’s date.

  3. Courier #2 is an Amazon S3 courier where the incoming files each day are labeled with today’s date.

  4. Courier #3 is a Campaign Monitor courier where the incoming files should be loaded at 00:00-00:00 from the day before.

  5. Courire #4 is a Salesforce courier.

If the courier group has a schedule of 0 13 * * *, data will always be loaded in ranges of 13:00-13:00. This requires offsets for couriers #1 and #3, to re-position the range and to ensure the correct data is loaded. The first courier is offset by 24 hours, the second courier is offset by 30 minutes, and the third courier is offset by 13 hours to provide a large enough window for each courier in the courier group to collect data.

Delete courier group

Use the Delete option to remove a courier group from Amperity. This should be done carefully. Verify that both upstream and downstream processes no longer depend on this courier group prior to deleting it.

Important

This action will not delete couriers that are associated with the courier group.

To delete a courier group

  1. From the Sources tab, open the menu for a courier group, and then select Delete.

  2. Click Delete to confirm.

Run courier group

A courier group may be run in the following ways:

Automatically

A courier group with a schedule (including time zones, wait times, and offsets) will run automatically.

To run a courier group automatically

  1. From the Sources tab, click Add Courier Group. This opens the Create Courier Group dialog box.

  2. Add the name for the courier group.

  3. Add a cron string to the Schedule field to define a schedule for the courier group.

    A schedule defines the frequency at which a courier group runs. All couriers in the same courier group run as a unit and all tasks must complete before a downstream process can be started. The schedule is defined using cron.

    Cron syntax specifies the fixed time, date, or interval at which cron will run. Each line represents a job, and is defined like this:

    ┌───────── minute (0 - 59)
    │ ┌─────────── hour (0 - 23)
    │ │ ┌───────────── day of the month (1 - 31)
    │ │ │ ┌────────────── month (1 - 12)
    │ │ │ │ ┌─────────────── day of the week (0 - 6) (Sunday to Saturday)
    │ │ │ │ │
    │ │ │ │ │
    │ │ │ │ │
    * * * * * command to execute
    

    For example, 30 8 * * * represents “run at 8:30 AM every day” and 30 8 * * 0 represents “run at 8:30 AM every Sunday”. Amperity validates your cron syntax and shows you the results. You may also use crontab guru to validate cron syntax.

  4. Set Status to ENABLED.

  5. Specify a time zone.

    end-before

    Note

    end-before

  6. Click Add a courier group constraint, and then select a courier group from the drop-down list. Do this for each courier to be added to the courier group.

  7. Specify the wait time and offset for each courier in the courier group.

  8. Click Save.

For a date range

A courier group can be configured to load all data for a specific date range.

To run a courier group for a date range

  1. From the Sources tab, open the menu for a courier group, and then select Run. The Run Courier Group page opens.

  2. Select Load data from a specific time period.

  3. Select a start date and an end date.

  4. To prevent downstream processing, select Load Only.

  5. To run as an SLA courier group, select SLA Run?.

  6. Click Run.

For a specific day

A courier group can be configured to load all data for a single day.

To run a courier group for a specific day

  1. From the Sources tab, open the menu for a courier group, and then select Run. The Run Courier Group page opens.

  2. Select Load data from a specific day, and then select a day.

  3. To prevent downstream processing, select Load Only.

  4. To run as an SLA courier group, select SLA Run?.

  5. Click Run.

For all data

A courier group can be configured to load all data that is available. This can be a large amount of data if the courier group is running for the first time.

To run a courier group for all data

  1. From the Sources tab, open the menu for a courier group, and then select Run. The Run Courier Group page opens.

  2. Select Load all data.

  3. To prevent downstream processing, select Load Only.

  4. To run as an SLA courier group, select SLA Run?.

  5. Click Run.

Manually

Use the Run option to run the courier group manually.

To run a courier group manually

  1. From the Sources tab, open the menu for a courier group, and then select Run. The Run Courier Group page opens.

  2. Select the time period for which data is loaded and indicate if downstream processes should be started automatically.

  3. To run as SLA, select SLA Run?.

  4. Click Run.

Missing files?

A courier group can be configured to send an email to the customer from Zendesk when files are missing, and then:

  1. Continue processing even if files are missing

  2. Stop processing

Important

Files can be missing for any number of reasons, including by delays that may have occurred in upstream workflows that exist outside of Amperity. And in many situations a file is late, not missing.

When files are missing or late, in addition to sending email from Zendesk and either continuing or stopping the workflow, Amperity will continue to attempt to find the these files.

A notification that begins with sense-missing- will indicate when Amperity is looking for missing files. Amperity will continue polling the data source location to discover if the file has been updated or added. It will be pulled down, processed, and loaded if it is discovered within the time window that allows it to be part of today’s Stitch run.

Notify, continue workflow

A courier group can be configured to send an email to the customer from Zendesk when one (or more) files are missing, and then continue processing if files are missing.

Tip

Some files are not considered essential to the daily Amperity run. The reasons why a particular file may be considered non-essential will vary from tenant to tenant, but they may include situations like:

  • A data source is mostly static

  • A data source does not contain PII that will affect the quality of the Amperity ID.

  • A data source is associated with a workflow that often misses the configured Amperity wait time period.

To send email and continue workflow

  1. From the Sources tab, open the menu for a courier group, and then select Edit.

  2. Under the name of a courier group, set Notify when missing? to enabled, and then set Abort when missing? to disabled.

  3. Click Save.

Notify, stop workflow

A courier group can be configured to wait up to the configured Wait time period, send an email to the customer from Zendesk when one (or more) files are missing, and then stop processing if files are missing.

To send email, and then stop a workflow

  1. From the Sources tab, open the menu for a courier group, and then select Edit.

  2. Under the name of a courier group, set Notify when missing? to enabled, and then set Abort when missing? to enabled.

  3. Click Save.

Define orchestration groups

An orchestration group is one (or more) orchestrations that are scheduled using a crontab file to define the schedule’s frequency. For example, an orchestration group can be scheduled to run at 8:30 AM every day of the week: 30 8 * * *.

Add orchestration group

For more complex workflows, click Add Orchestration Group to combine multiple orchestrations into a single workflow that is scheduled and may apply constraints on courier groups to ensure that all upstream processes are able to run successfully within the window of time required by the orchestration.

To add an orchestration group

  1. From the Destinations tab click Add Orchestration Group. This opens the Add Orchestration Group dialog box.

  2. Enter a name for the orchestration group.

  3. Add a cron string to the Schedule field to define a schedule for the orchestration group.

    A schedule defines the frequency at which an orchestration group runs. All orchestrations in the same orchestration group run as a unit and all tasks must complete before a downstream process can be started. The schedule is defined using cron.

    Cron syntax specifies the fixed time, date, or interval at which cron will run. Each line represents a job, and is defined like this:

    ┌───────── minute (0 - 59)
    │ ┌─────────── hour (0 - 23)
    │ │ ┌───────────── day of the month (1 - 31)
    │ │ │ ┌────────────── month (1 - 12)
    │ │ │ │ ┌─────────────── day of the week (0 - 6) (Sunday to Saturday)
    │ │ │ │ │
    │ │ │ │ │
    │ │ │ │ │
    * * * * * command to execute
    

    For example, 30 8 * * * represents “run at 8:30 AM every day” and 30 8 * * 0 represents “run at 8:30 AM every Sunday”. Amperity validates your cron syntax and shows you the results. You may also use crontab guru to validate cron syntax.

  4. Specify a time zone.

    An orchestration group schedule is associated with a time zone. The time zone determines the point at which an orchestration group’s scheduled start time begins, and then how any courier group constraints and offsets are applied.

    Note

    The time zone that is chosen for an orchestration group schedule should consider every downstream business processes that requires the data and also the time zone(s) in which the consumers of that data will operate.

  5. Click Add a courier group constraint, and then select a courier group from the drop-down list.

    A courier group constraint is a dependency that an orchestration group has on a courier group. This courier group must run successfully, including loading all data and meeting any SLA requirements. The orchestration group will run only when all courier groups to which the orchestration group has a constraint have run successfully.

  6. For each courier group constraint, apply any offsets.

    An offset is a constraint placed on an orchestration group to ensure that all upstream dependencies are processed in a way that meets all SLA requirements. Specifically, by ensuring that each courier group to which the orchestration group has a constraint was able to run successfully and it met all SLA requirements. The offset that is applied to each courier group constraint should provide a long enough duration to ensure there is overlap with the scheduled time range for that courier group.

  7. Click Save.

Apply courier offsets

An offset is a constraint placed on an orchestration group to ensure that all upstream dependencies are processed in a way that meets all SLA requirements. Specifically, by ensuring that each courier group to which the orchestration group has a constraint was able to run successfully and it met all SLA requirements. The offset that is applied to each courier group constraint should provide a long enough duration to ensure there is overlap with the scheduled time range for that courier group.

Define schedules

An orchestration group that is configured to have a schedule or courier group constraint may be configured as part of an automated workflow. Alerts are generated when any of the individual couriers in a courier group constraint are unable to complete their scheduled tasks or if a query fails to run correctly.

Tip

Amperity workflows are typically run once per day. For the best results for a daily schedule, define only the minute and hour settings. For example: 30 8 * * *.

Important

Some workflows do not require a daily update. Amperity supports running workflows on less frequent basis, such as on a weekly basis. For example, to define a workflow that runs at 8:30 AM every Monday, use a cron string that identifies the day of the week. For example: 30 8 * * 1 where 1 identifies the day of the week (Monday).

About cron

Cron is a time-based job scheduler that uses cron syntax to automate scheduled jobs to run periodically at fixed times, dates, or intervals.

Cron syntax specifies the fixed time, date, or interval at which cron will run. Each line represents a job, and is defined like this:

┌───────── minute (0 - 59)
│ ┌─────────── hour (0 - 23)
│ │ ┌───────────── day of the month (1 - 31)
│ │ │ ┌────────────── month (1 - 12)
│ │ │ │ ┌─────────────── day of the week (0 - 6) (Sunday to Saturday)
│ │ │ │ │
│ │ │ │ │
│ │ │ │ │
* * * * * command to execute

For example, 30 8 * * * represents “run at 8:30 AM every day” and 30 8 * * 0 represents “run at 8:30 AM every Sunday”. Amperity validates your cron syntax and shows you the results. You may also use crontab guru to validate cron syntax.

An orchestration group that is run as SLA must be configured for one of the following run states:

  1. Scheduled, always run

  2. Scheduled, wait for changes

  3. Wait for changes

Scheduled, always run

An orchestration group can be scheduled to run every day, regardless of changes to upstream data.

To always run a scheduled orchestration

  1. From the Destinations tab, under Destinations, open the menu in the same row as the destination to be edited, and then select Edit.

  2. Enter a schedule.

  3. Click Save.

  4. From the Destinations tab, under Orchestrations, open the menu in the same row as the orchestration to be edited, and then select Edit.

  5. Under Workflow, select Automatically, and then select the name of a query.

  6. Click Save.

Scheduled, wait for changes

An orchestration group can be scheduled to run every day, but then only start the run if upstream data has changed.

To wait for changes, then run a scheduled orchestration

  1. From the Destinations tab, under Destinations, open the menu in the same row as the destination to be edited, and then select Edit.

  2. Enter a schedule, the courier group constraint, and an offset. The specified courier group must have updated data. The orchestration group will check for updated data at the scheduled time, but will run only when there is updated data.

  3. Click Save.

  4. From the Destinations tab, under Orchestrations, open the menu in the same row as the orchestration to be edited, and then select Edit.

  5. Under Workflow, select Automatically, and then select the name of a query.

  6. Click Save.

Wait for changes

An orchestration group can be scheduled to run only when upstream data changes.

To only wait for changes

  1. From the Destinations tab, under Destinations, open the menu in the same row as the destination to be edited, and then select Edit.

  2. Enter a schedule (optional), the courier group constraint, and an offset. The specified courier group must have updated data for this orchestration group to run.

  3. Click Save.

  4. From the Destinations tab, under Orchestrations, open the menu in the same row as the orchestration to be edited, and then select Edit.

  5. Under Workflow, select Automatically, and then select the name of a query.

  6. Click Save.

Specify time zones

An orchestration group schedule is associated with a time zone. The time zone determines the point at which an orchestration group’s scheduled start time begins, and then how any courier group constraints and offsets are applied.

The time zones that are available for selection in Amperity are modeled after the Google Calendar and are similar to:

(GMT-06:00) Pacific Time
(GMT-07:00) Mountain Time
(GMT-08:00) Central Time
(GMT-09:00) Eastern Time

The time zone that is chosen for an orchestration group schedule should consider every downstream business processes that requires the data and also the time zone(s) in which the consumers of that data will operate.

Tip

Do not create orchestration group schedules that may occur during a daylight savings time transition.

For example, an orchestration group schedule with the cron string of 30 2 * * * and the time zone of “(GMT-08:00) Pacific Time” will run once a day most at 2:30am, except for one day in the spring when it will not run at all and one day in the fall when it will run twice.

This is because American daylight savings time transitions at 2:00 AM, meaning the 2:00 AM hour occurs twice when transitioning out of daylight savings time (Fall) and is skipped altogether when transitioning into daylight savings time (Spring).

Run orchestration group

When an orchestration group runs, all orchestrations associated with the group will run. Any courier group constraints and schedule settings will be applied.

To run an orchestration group

  1. From the Destinations tab, open the menu for an orchestration group, and then select Run.

  2. The Status column for the orchestration group will update to say “Waiting to start…”, after which the notifications pane will update to include a notification that shows the status of the orchestration group.

  3. When the orchestration group has run successfully, its status is updated to “Completed”.

Manage SLA settings

A service level agreement (SLA) is condition in Amperity that guarantees that a process will run successfully. In the rare case where a process does not run successfully it is treated with the highest level of urgency by Amperity on-call systems and support engineers.

For courier groups

A courier group that is configured for SLA is guaranteed as an Amperity managed workflow. Alerts are generated when any of the individual couriers in the courier group are unable to complete their scheduled tasks or when the courier group is unable to load all data to Amperity.

Tip

Use a wait time to extend the time window for data to be made available. This can help reduce the number of SLA alerts that may be generated for data sources that cannot be picked up by a courier group.

To enable SLA for a courier

  1. From the Sources tab, open the menu for a courier group, and then select Run. The Run Courier Group page opens.

  2. Select the time period for which data is loaded and indicate if downstream processes should be started automatically.

  3. To run as an SLA courier group, select SLA Run?.

  4. Click Run.

For queries

A query that is configured to run as an SLA query is guaranteed to run with every automated workflow and is actively monitored by Amperity.

Caution

A query should be thoroughly tested prior to configuring it to be run as an SLA query. Before configuring a query to run as an SLA query, be sure to:

  1. Verify that all upstream and downstream workflows are configured correctly.

  2. Verify the customer 360 database has all of the correct tables and columns necessary to support the desired query.

  3. Peer review the SQL query, if possible.

  4. Validate the query from the query editor to make sure that the results contain the desired data points.

  5. Configure a destination to receive the query results, run the destination manually, and then verify the destination received the query results.

To enable SLA for a query

  1. From the Queries tab, open the menu for a query, and then select Edit. This opens a query editor with the query labled a draft query.

  2. Under Query Settings, select the Refresh automatically checkbox. This option is required to run a query as an SLA query.

  3. Select the Set as SLA query checkbox.

  4. Click Activate. The query will run automatically when upstream data changes.