About Courier Groups¶
A courier group is a list of one (or more) couriers that are run as a group, either ad hoc or as part of an automated schedule. A courier group can be configured to act as a constraint on downstream workflows.
A courier group is typically configured to run automatically on a recurring schedule. Because all couriers within a courier groups run as a unit, all of their dependent tasks must complete before any downstream processes, such as Stitch or database generation, can be started.
What a courier group does:
Logically organizes a list of couriers into a group that shares the same schedule and workflow.
Allows for each courier to be assigned schedule variance via wait times and offsets.
Enables both automatic and ad hoc runs of couriers.
Polls each data source associated with a courier in the group to determine if data is ready to be pulled to Amperity.
Ensures that constraints for downstream processes are present in the workflow; all couriers in a courier group must complete their jobs.
Enables a workflow to be assigned SLA status.
What a courier group needs:
One (or more) couriers.
A schedule.
A run type.
Configuration for wait times and offsets to help ensure that all files assigned to the courier group have a time window that is large enough to complete data collection.
The Sources tab shows the status of all courier groups, including when they last ran or updated, and its current status.
Schedules¶
A schedule defines the frequency at which a courier group runs. All couriers in the same courier group run as a unit and all tasks must complete before a downstream process can be started. The schedule is defined using cron.
Cron is a time-based job scheduler that uses cron syntax to automate scheduled jobs to run periodically at fixed times, dates, or intervals.
Cron syntax specifies the fixed time, date, or interval at which cron will run. Each line represents a job, and is defined like this:
┌───────── minute (0 - 59)
│ ┌─────────── hour (0 - 23)
│ │ ┌───────────── day of the month (1 - 31)
│ │ │ ┌────────────── month (1 - 12)
│ │ │ │ ┌─────────────── day of the week (0 - 6) (Sunday to Saturday)
│ │ │ │ │
│ │ │ │ │
│ │ │ │ │
* * * * * command to execute
For example, 30 8 * * *
represents “run at 8:30 AM every day” and 30 8 * * 0
represents “run at 8:30 AM every Sunday”. Amperity validates your cron syntax and shows you the results. You may also use crontab guru to validate cron syntax.
Amperity uses cron syntax to schedule the time at which a courier group is available for transferring files from a customer data source location to Amperity. A courier group that is scheduled runs automatically. Schedules are in UTC.
Note
Scheduling a courier group is optional. When a courier group is not assigned a schedule, it may be run manually on an ad hoc basis.
When a courier group is scheduled to run less frequently than daily (e.g. weekly, monthly, quarterly, etc), the default behavior of the courier group will be to check for files on each day that has passed since the courier group last ran. This allows you to pick up files that may have been dropped on a daily basis less frequently.
For example, you may want to pick up daily files once per week. You may change this behavior with the “Only retrieve files dropped in the past day” toggle. Enabling this will only search for files that have dropped within the 24 hour window leading up to the courier group start time. An example of when this is beneficial is if your files are dropped weekly on a specific day, and your courier group only needs to pick up files from that specific day.
Tip
Daylight savings time can affect a schedule. Be sure to set the schedule to be stable and not require changes over time. For example: if a schedule is set to 12:30 AM, and then you fall back, the schedule may become 11:30 PM (fall back) or 1:30 AM (spring forward).
Run types¶
A run type determines the workflow behavior of the courier group and what tasks are run after data is ingested. Use any of the following run types:
Full workflow This workflow runs the courier group workflow and then initiates any orchestration group workflows configured to run once the courier group workflow process is complete. This means that data gets ingested, stitched, and published into databases, and then any configured queries and orchestrations in your tenant will run.
Partial workflow This workflow runs the courier group workflow, but does not initiate any orchestration group workflows configured to run after the courier group workflow is complete. This allows you to update the data in your databases without initiating downstream orchestrations and potentially re-sending data.
Ingest only This workflow ingests data into source domain tables, but does not run Stitch nor generate databases.
Wait times¶
A wait time is a constraint placed on a courier group that defines an extended time window for data to be made available at the source location.
A courier group typically runs on an automated schedule that expects customer data to be available at the source location within a defined time window. However, in some cases, the customer data may be delayed and isn’t made available within that time window.
Use a wait time to extend the time window for data to be made available. This can help reduce the number of SLA alerts that may be generated for data sources that cannot be picked up by a courier group.
Note
For couriers associated with a filedrop location the default wait time is 0. A polling operation only checks for a data source before declaring success or failure. For couriers associated with REST APIs and data warehouses, the polling operation is always considered to be successful.
A downstream process begins after each load operation is completed for each data source associated with each courier in the courier group and each domain table has been updated.
Offsets¶
An offset is a constraint placed on a courier group that defines a range of time that is older than the scheduled time, within which a courier group will accept customer data as valid for the current job. Offset times are in UTC.
A courier group offset is typically set to be 24 hours. For example, it’s possible for customer data to be generated with a correct file name and datestamp appended to it, but for that datestamp to represent the previous day because of the customer’s own workflow. An offset ensures that the data at the source location is recognized by the courier as the correct data source.
Warning
An offset affects couriers in a courier group whether or not they run on a schedule.
Important
The schedule defines the frequency at which the courier group will run.
The timezone is the time at which the courier group will run. This may be set to your local time zone.
Individual courier offsets are calculated using Coordinated Universal Time (UTC), even when a non-UTC time zone is specified for the courier group. This means that when a courier group runs, the current time in UTC is used to calculate the offset.
When a courier group is set to your local time zone, you must consider the offset for your local time zone when defining the offset for each courier in the courier group.
To define courier group offsets
From the Sources tab, click Add Courier Group. This opens the Edit Courier Group dialog box.
Add the name for the courier group.
Define the schedule.
From the Run mode menu, either select Full workflow, Partial workflow, or Ingest only.
Set Courier group status? to ENABLED.
Set Monitor SLA? to either Disabled or Enabled.
Set Only retrieve files dropped in the past day? to either Disabled or Enabled.
Click Add a courier group constraint, and then select a courier group from the drop-down list. Do this for each courier to be added to the courier group.
Specify a time zone.
Specify the wait time.
Specify the offset.
Set Notify when missing? to either Disabled or Enabled.
Set Abort when missing? to either Disabled or Enabled.
Click Save.
Time zones¶
A courier group schedule is associated with a time zone. The time zone determines the point at which a courier group’s scheduled start time begins. A time zone should be aligned with the time zone of system from which the data is being pulled.
The time zones that are available for selection in Amperity are modeled after the Google Calendar and are similar to:
(GMT-08:00) Pacific Time
(GMT-07:00) Mountain Time
(GMT-08:00) Central Time
(GMT-09:00) Eastern Time
The time zone that is chosen for an courier group schedule should consider every downstream business processes that requires the data and also the time zone(s) in which the consumers of that data will operate.
Tip
Do not create courier group schedules that may occur during a daylight savings time transition.
For example, an courier group schedule with the cron string of 30 2 * * *
and the time zone of “(GMT-08:00) Pacific Time” will run once a day most at 2:30am, except for one day in the spring when it will not run at all and one day in the fall when it will run twice.
This is because American daylight savings time transitions at 2:00 AM, meaning the 2:00 AM hour occurs twice when transitioning out of daylight savings time (Fall) and is skipped altogether when transitioning into daylight savings time (Spring).
How-tos¶
This section describes tasks related to managing courier groups in Amperity:
Add courier group¶
Use the Add Courier Group button to add a courier group to Amperity. A courier group should be created to consolidate individual couriers into a scheduled workflow that can be run under the Amperity SLA.
For each courier added to a courier group, define a wait time and an offset. This is used to help determine how much time the courier group should wait for the files associated with a courier to be ready for processing.
In some cases, if the files are not ready, the courier (and courier group) will fail. But in other cases, if the files in the courier are not flagged as required, the courier group may continue processing the rest of the files.
To add a courier
From the Sources tab, click Add Courier Group. This opens the Edit Courier Group dialog box.
Add the name for the courier group.
Add a cron string to the Schedule field to define a schedule for the courier group.
A schedule defines the frequency at which a courier group runs. All couriers in the same courier group run as a unit and all tasks must complete before a downstream process can be started. The schedule is defined using cron.
Cron syntax specifies the fixed time, date, or interval at which cron will run. Each line represents a job, and is defined like this:
┌───────── minute (0 - 59) │ ┌─────────── hour (0 - 23) │ │ ┌───────────── day of the month (1 - 31) │ │ │ ┌────────────── month (1 - 12) │ │ │ │ ┌─────────────── day of the week (0 - 6) (Sunday to Saturday) │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ * * * * * command to execute
For example,
30 8 * * *
represents “run at 8:30 AM every day” and30 8 * * 0
represents “run at 8:30 AM every Sunday”. Amperity validates your cron syntax and shows you the results. You may also use crontab guru to validate cron syntax.Tip
Daylight savings time can affect a schedule. Be sure to set the schedule to be stable and not require changes over time. For example: if a schedule is set to 12:30 AM, and then you fall back, the schedule may become 11:30 PM (fall back) or 1:30 AM (spring forward).
From the Run mode menu, either select Full workflow, Partial workflow, or Ingest only.
Set Courier group status? to ENABLED.
Set Monitor SLA? to either Disabled or Enabled.
Set Only retrieve files dropped in the past day? to either Disabled or Enabled.
Click Add a courier group constraint, and then select a courier group from the drop-down list. Do this for each courier to be added to the courier group.
Specify a time zone.
A courier group schedule is associated with a time zone. The time zone determines the point at which a courier group’s scheduled start time begins. A time zone should be aligned with the time zone of system from which the data is being pulled.
Note
The time zone that is chosen for an courier group schedule should consider every downstream business processes that requires the data and also the time zone(s) in which the consumers of that data will operate.
Specify the wait time.
A wait time is a constraint placed on a courier group that defines an extended time window for data to be made available at the source location.
A courier group typically runs on an automated schedule that expects customer data to be available at the source location within a defined time window. However, in some cases, the customer data may be delayed and isn’t made available within that time window.
Use a wait time to extend the time window for data to be made available. This can help reduce the number of SLA alerts that may be generated for data sources that cannot be picked up by a courier group.
Specify the offset.
An offset is a constraint placed on a courier group that defines a range of time that is older than the scheduled time, within which a courier group will accept customer data as valid for the current job. Offset times are in UTC.
A courier group offset is typically set to be 24 hours. For example, it’s possible for customer data to be generated with a correct file name and datestamp appended to it, but for that datestamp to represent the previous day because of the customer’s own workflow. An offset ensures that the data at the source location is recognized by the courier as the correct data source.
Warning
An offset affects couriers in a courier group whether or not they run on a schedule.
Set Notify when missing? to either Disabled or Enabled.
Set Abort when missing? to either Disabled or Enabled.
Click Save.
Delete courier group¶
Use the Delete option to remove a courier group from Amperity. This should be done carefully. Verify that both upstream and downstream processes no longer depend on this courier group prior to deleting it.
Important
This action will not delete couriers that are associated with the courier group.
To delete a courier group
From the Sources tab, open the menu for a courier group, and then select Delete.
Click Delete to confirm.
Run courier groups¶
A courier group may be run in the following ways:
Automatically¶
A courier group with a schedule (including time zones, wait times, and offsets) will run automatically.
To run a courier group automatically
From the Sources tab, click Add Courier Group. This opens the Edit Courier Group dialog box.
Add the name for the courier group.
Add a cron string to the Schedule field to define a schedule for the courier group.
A schedule defines the frequency at which a courier group runs. All couriers in the same courier group run as a unit and all tasks must complete before a downstream process can be started. The schedule is defined using cron.
Cron syntax specifies the fixed time, date, or interval at which cron will run. Each line represents a job, and is defined like this:
┌───────── minute (0 - 59) │ ┌─────────── hour (0 - 23) │ │ ┌───────────── day of the month (1 - 31) │ │ │ ┌────────────── month (1 - 12) │ │ │ │ ┌─────────────── day of the week (0 - 6) (Sunday to Saturday) │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ * * * * * command to execute
For example,
30 8 * * *
represents “run at 8:30 AM every day” and30 8 * * 0
represents “run at 8:30 AM every Sunday”. Amperity validates your cron syntax and shows you the results. You may also use crontab guru to validate cron syntax.From the Run mode menu, either select Full workflow, Partial workflow, or Ingest only.
Set Courier group status? to ENABLED.
Set Monitor SLA? to either Disabled or Enabled.
Set Only retrieve files dropped in the past day? to either Disabled or Enabled.
Click Add a courier group constraint, and then select a courier group from the drop-down list. Do this for each courier to be added to the courier group.
Specify a time zone.
- end-before
Note
- end-before
Specify the wait time and offset for each courier in the courier group.
Set Notify when missing? to either Disabled or Enabled.
Set Abort when missing? to either Disabled or Enabled.
Click Save.
For a date range¶
A courier group can be configured to load all data for a specific date range.
To run a courier group for a date range
From the Sources tab, open the menu for a courier group, and then select Run. The Run Courier Group page opens.
Select Load data from a specific time period.
Select a start date and an end date.
To prevent downstream processing, select Load Only.
To run as an SLA courier group, select Monitor for SLA?.
Click Run.
For a specific day¶
A courier group can be configured to load all data for a single day.
To run a courier group for a specific day
From the Sources tab, open the menu for a courier group, and then select Run. The Run Courier Group page opens.
Select Load data from a specific day, and then select a day.
To prevent downstream processing, select Load Only.
To run as an SLA courier group, select Monitor for SLA?.
Click Run.
For all data¶
A courier group can be configured to load all data that is available. This can be a large amount of data if the courier group is running for the first time.
To run a courier group for-all-data
From the Sources tab, open the menu for a courier group, and then select Run. The Run Courier Group page opens.
Select Load all data.
To prevent downstream processing, select Load Only.
To run as an SLA courier group, select Monitor for SLA?.
Click Run.
Manually¶
Use the Run option to run the courier group manually.
To run a courier group manually
From the Sources tab, open the menu for a courier group, and then select Run. The Run Courier Group page opens.
Select the time period for which data is loaded and indicate if downstream processes should be started automatically.
To run as SLA, select SLA Run?.
Click Run.