About Stitch benchmarks (BETA)

Stitch benchmarks are heuristic scores that define the expectations for the quality of customer profiles that are output by Stitch. Each benchmark evaluates your brand’s data and compares it to a baseline score.

Use benchmarks to explore data quality, directly provide feedback to the quality of Stitch results, and to explore configuration changes that can help improve the quality of customer profiles in your tenant.

Benchmark status page

Stitch benchmarks are available from the Stitch page in your Amperity tenant. Open the Benchmarks tab to review the overall status for Stitch benchmark checks in your tenant.

The Stitch benchmark status page.

The outcome of Stitch benchmark checks are grouped by color on the Benchmarks tab.

For each benchmark check on the Benchmarks tab you can click to open the benchmark, and explore details, the previous five benchmark scores, interpretations, and a link to open a dialog box from which you can review and grade a representative sample of 10 examples.

Benchmark checks

Stitch collects data that your brand has provided to Amperity, runs, and then outputs a series of tables that contain the results. There is no “ground truth” dataset for your brand against which Amperity can compare Amperity IDs to validate identity, which prevents using standard error metrics to evaluate the quality of Stitch output.

A benchmark check is a heuristic that defines how often Amperity IDs are expected to meet a certain condition. For example, Amperity expects no more than 0.011% of your Amperity IDs to be associated with more than three given names.

Each benchmark check measures the percentage of Amperity IDs meeting its respective condition and compares the result against the optimal range. A benchmark check result can fall into the optimal range, above the optimal range, or far above the optimal range (high).

For example, it is expected that most, but not all, Amperity IDs should not have more than three given names. Is is possible for an Amperity ID to be correclty associated with more than three given names for valid reasons such as differences in data capture, the presence of typos, use of nicknames, name changes, and so on. A higher than expected rate of Amperity IDs (generally) associated with more than three given names may be an indicator that Stitch is clustering records together too aggressively.

Important

Stitch is complex and perfection of Stitch results should not be the goal. A benchmark score that falls outside of an optimal range might not be a bad score. A high benchmark score does not always need to be addressed, at least not right away.

The purpose of benchmark scores is to provide a visible and direct way of inspecting the quality of customer profiles that currently exist in your tenant.

Use benchmark scores to:

  1. Quickly assess the overall quality of customer profiles in your tenant.

  2. Explore example Amperity IDs, especially for those benchmark checks that are high, to identify ways of changing the configuration of your tenant that can lead to overall improvements in benchmark scores.

    Use a sandbox to test configuration changes. Compare the scores in the sandbox to the scores in production.

  3. Improve your understanding of how Stitch builds customer profiles based on the data sources that your brand has provided to Amperity.

  4. Identify specific areas of improvement, such as updating semantic tags in feeds or custom domain tables, changing the set of domain tables that are made available to Stitch, or identifying a foreign key or separation key that is causing issues with cluster quality.

    Look for themes and address them. For example, if a benchmark check shows 7 out of 10 examples all being wrong in the same way, that is a strong indicator that a configuration change should improve cluster quality. If all 10 examples are different you can mark them as edge cases and move on.

    Think about the big picture: the overwhelming percentage of customer profiles are accurate. Benchmark checks look at the edges of that accuracy and give you ways to extend that accuracy to a small percentage of profiles.

    For example, if you find obvious mistakes with the Amperity IDs with many given names benchmark check, but the results are optimal, then any changes to that benchmark are likely to have a very small affect on overall cluster quality, even if some individual profiles are incorrect.

Benchmark results

Amperity uses benchmark checks to provide insight into the quality of your Stitch results. Results fall into one of the following categories:

Optimal

Optimal results represent benchmark check results that fall within the expected range. These results can be “more optimal” and they can be “less optimal”.

For most tenants, most of the time, nothing needs to be done when benchmark checks are optimal. In some cases, it might be worth exploring if scores that fall on the edge of optimal scoring (and are close to falling outside the optimal range) can be improved.

Optimal score results.

Above optimal range

Results that fall above the optimal ranges may be investigated, but it’s often not necessary. Compare the history of the scores and determine if anything should be done to try to improve the benchmark results.

Was new data made available to your tenant? Were any changes made to Stitch configuration? Both of these may be the cause of scores falling above the optimal range.

In many cases nothing needs to be done with benchmark checks that fall above the optimal range beyond monitoring the result to see if it continues to increase or if it stabilizes.

Outside optinal range score results.

High

High results do not need to be fixed, but they should be researched and investigated. In many cases, high results indicate that improvements to the quality of Stitch results can be made.

Review and grade the results for benchmark checks with high results by assigning thumbs up or thumbs down to the sample set of records, after which you should click Next steps, and then review the list of options that are available to help improve this particular benchmark result.

Abnormal score results.

Important

Use a sandbox to make configuration changes to Stitch, and then compare the benchmark results in the sandbox to the high benchmark results in production. Also compare other benchmark results to determine if changes affected the overall quality of benchmark results.

About benchmark cards

Each benchmark card contains a condition summary, such as Amperity IDs with many given names, result (“0.125%”), outcome (Optimal, Above optimal, or High), along with a visualization that shows how the benchmark result compares to the optimal range.

Benchmark details

Benchmark details show specific information about the condition, such as The percentage of Amperity ID clusters with more than 3 given names, a visualization that shows the result in the context of the optimal range, a toggle to show or hide historical results, tips about how to interpret the results, along with any recommended next steps.

Each benchmark shows score results.

History

Benchmark results are refreshed after every Stitch run. You can view the 5 previous benchmark results by enabling the Show history option in the benchmark details dialog.

Each benchmark tracks a history of scores.

Interpretations

Interpretations are provided by each benchmark check. They describe the result and provide an explanation of how to interpret it. For example:

“This score is above the typical range for most brands. A large percentage indicates that different postal codes are appearing in the same cluster, which indicates overclustering.”

or:

“This score is far above the typical range for most brands. A large ratio indicates that a unique name and physical address combination appears in multiple clusters, which indicates underclustering.”

When a benchmark score is above the optimal range or high it is recommended to review and grade a set of 10 example clusters, after which the benchmark check will make a series of recommendations that can lead to improved benchmark results.

Grade and calibrate

All benchmark checks include example Amperity IDs that can be reviewed and graded. You should periodically review and grade examples for benchmark checks with high results. This helps ensure that Stitch is always building the highest quality customer profiles and can lead to incremental improvements over time.

Depending on the outcome of reviewing and grading benchmark check examples, a series of recommendations may be shown. Stitch configuration settings can be updated directly in the benchmark check.

How does Amperity choose which records are available for grading?

Amperity uses stratified random sampling to select the examples. A fresh set of examples is generated during each Stitch run.

  1. All clusters (or groups of clusters) that are flagged by the check are collected.

    For example, with the Amperity IDs with many surnames check, all clusters with more than 3 surnames are collected.

  2. A rules-based approach is used to determine which of these clusters are likely to be “good” identity decisions and which are likely to be “poor” identity decisions.

    The percentage of clusters that are likely to be “good” identity decisions and the percentage likely to be “poor” identity decisions are identified.

    For example: 70% good, 30% poor.

  3. 10 examples are selected at random using the same rate of “good” and “poor” clusters.

    For example, 7 records will represent “good” identity decisions and 3 records will represent “poor”.

    Amperity will make a recommendation for when to rate each example as “good”, but cannot identify without input which examples in the random sample represent “good” or “poor” identity decisions.

Update Stitch configuration

Depending on the outcome of reviewing benchmark check results and examples, a series of recommendations may be shown. Each recommendation represents a change that you can make to Stitch configuration that should lead to improvements in benchmark results.

Changes should be made incrementally. You can review benchmark checks on a daily basis. Review the results, and then make additional incremental changes, if necessary. Monitor the benchmark results after a configuration change for signs of improvement.

Benchmark categories

The following sections list benchmark checks by category:

Overclustering

An overcluster, or a false positive, occurs when distinct records are incorrectly added to a cluster of records. Each overcluster affects the precision of identity resolution and should be investigated to understand why it occurred.

Stitch benchmark checks for overclustering evaluate situations where records that likely belong to two or more individuals end up being assigned the same Amperity ID. This can occur when records with mostly different personally identifiable information (PII) are connected by a foreign key or by a small set of matching PII.

Many given names

The Amperity IDs with many given names benchmark computes the percentage of Amperity IDs with more than three given names.

A larger percentage implies that too many given names are being associated with the same Amperity ID at a higher-than-expected frequency.

Many postal codes

The Amperity IDs with many postal codes benchmark computes the percentage of Amperity IDs with more than five postal codes.

A larger percentage implies that too many postal codes are being associated with the same Amperity ID at a higher-than-expected frequency.

Many surnames

The Amperity IDs with many surnames benchmark computes the percentage of Amperity IDs with more than three surnames.

A larger percentage implies that too many surnames are being associated with the same Amperity ID at a higher-than-expected frequency.

Underclustering

An undercluster, or a false negative, occurs when distinct records are incorrectly split from a cluster of records. Each undercluster affects the precision of identity resolution and should be investigated to understand why it occurred.

Stitch benchmark checks for underclustering evaluate situations where records that likely belong to the same individual end up being assigned different Amperity IDs. This can occur when records with similar personally identifiable information (PII) are separated by a separation key or by a small set of mis-matching PII.

Shared names and emails

The Shared names and emails across Amperity IDs benchmark computes the ratio of unique name and email address combinations that appear in more than one Amperity ID cluster to those that appear in just one Amperity ID.

A large ratio implies that a name and email address combination is associated with multiple Amperity IDs more often than expected.

Shared names and phones

The Shared names and phones across Amperity IDs benchmark computes the ratio of unique name and phone number combinations that appear in more than one Amperity ID cluster to those that appear in just one Amperity ID.

A large ratio implies that a name and phone number combination is associated with multiple Amperity IDs more than often expected.

Shared names and addresses

The Shared names and addresses across Amperity IDs benchmark computes the ratio of unique name and address (including street, city, state, and postal code) combinations that appear in more than one Amperity ID cluster to those that appear in just one Amperity ID.

A large ratio implies that a name and address combination is associated with multiple Amperity IDs more often than expected.