Cluster scores

A pairwise connection is a pair of matching records within a block that have an initial score above threshold. Each pairwise connection within a block is scored, after which all pairwise connections that scored above threshold represent a single, unique individual.

A score is assigned to every pairwise connection. The score is measured in two parts, separated by a period.

The first part–the record pair score–correlates to the match category, which is a machine learning classifier that is applied by Amperity to individual record pairs. The record pair score corresponds to the classification: 5 for exact matches, 4 for excellent matches, 3 for high matches, 2 for moderate matches, 1 for weak matches, and 0 for no conflicts.

The second part–the record pair strength–is used by Stitch to help determine the quality of the record pair score. This value appears in the Stitch report as a two decimal number. A record pair strength by itself is not a direct indicator of the quality of a pairwise connection score.

Use this query to investigate pairwise comparison scores for an individual cluster or for a set of clusters.

When to use

  1. Run this query to investigate pairwise scores.

  2. Identify the Amperity IDs each cluster to be investigated and update the query with those IDs.

Configure query

  1. From the Queries page, open the Stitch QA folder, and then select this query.

    Tip

    Add the Stitch QA queries template folder if it does not already exist.

  2. The query selects a subset of the fields in the Unified Coalesced table by default. You may adjust this list up to (and including) selecting all fields.

  3. The query is configured for a single Amperity ID. Replace amp_id with an Amperity ID:

    FROM Unified_Coalesced uc1
    JOIN unified_coalesced uc2
    ON uc1.amperity_id = uc2.amperity_id
    AND uc1.amperity_id = amp_id
    JOIN Unified_Scores us
    ON uc1.amperity_id = us.amperity_id
    AND uc1.pk = us.pk1
    AND uc2.pk = us.pk2
    
  4. To run the query for multiple Amperity IDs, update the query to input a list of IDs, and then replace each amp_id with an Amperity ID:

    FROM Unified_Coalesced uc1
    JOIN unified_coalesced uc2
    ON uc1.amperity_id = uc2.amperity_id
    AND uc1.amperity_id IN (amp_id, amp_id, amp_id, ...)
    JOIN Unified_Scores us
    ON uc1.amperity_id = us.amperity_id
    AND uc1.pk = us.pk1
    AND uc2.pk = us.pk2
    
  5. Click Run Query and debug any issues that may arise.

  6. Click Activate.