Address-based householding

A household is a group of people who share a physical address and possibly other attributes. For example: a shared device or network, a shared last name, a shared phone number. Householding is a process that identifies a unique household in a dataset.

Address-based householding is a process that standardizes addresses, pairs them with a last name, and then assigns that pair a Household ID.

A Household ID is a universally unique identifier (UUID) that uniquely identifies the combination of a standardized address and a last name.

Note

Address-based householding is a starting point. An implementation of address-based householding can vary across brand use cases, depending on the

  • Type and number of data sources

  • Number of addresses

  • Types of direct mail campaigns that will be based on the results

How it works

Address-based householding is built upon the results of the Merged Customers table. This enables address-based householding the ability to use the results of individual identity resolution that is available from the Merged Customers table.

Apply the results of address-based householding to campaigns that rely on physical addresses, such as direct mail campaigns, to ensure that a single household gets a single piece of direct mail, even when multiple unique individuals reside within the same household.

Tip

Address-based householding can be improved when used alongside standardization provided by national change of address (NCOA) and coding accuracy support system (CASS).

National Change of Address (NCOA) is a secure dataset of about 160 million permanent change-of-address (COA) records. NCOA records consist of the names and addresses of individuals, families, and businesses who have filed a change-of-address with the United States Postal Service (USPS).

Coding accuracy support system (CASS) is an address standardization concept that helps clean address to make them more effective for direct mail campaigns.

The process for enabling address-based householding includes:

  1. Reviewing the Merged Customers table to identify any tenant-specific behaviors within the table that may need to be considered when extending Amperity for address-based householding.

  2. Reviewing the bad-values blocklist to identify any tenant-specific behaviors within the blocklist workflow and to identify the name of the domain table associated with the bad-values blocklist feed.

  3. Adding a feed for the address-standardization.csv file

  4. Adding a Merged Households table to the customer 360 database

  5. Building segments that use the Household ID, which is a UUID that is available to segments from the Merged Households table.

  6. Sending segment results to any downstream process.

Tip

Additional configuration options for address-based householding include:

  1. Joining the Merged Households table to the Customer 360 table to add the household_id and household_size fields. This makes them available as profile attributes.

  2. Using a common table expression (CTE) to flag a single individual that is associated with an address as the primary contact.

  3. Adding a Household 360 table. This is similar to the Customer 360 table and can merge values down to a single row per Household ID. This enables the use of summary attributes, such as identifying household lifetime value.

Add the data asset

Address standardization starts as a feed that loads a CSV file that contains a list of address variations for state and street names.

Note

The address standardization data asset is available from an Amazon S3 bucket named Amperity Data Assets or from Amperity Bridge.

  1. Configure Amperity Bridge. Choose the Reference data option, and then select the address_standardization_conversion table.

  2. To use the Amazon S3 bucket as the source for address standardization, make a request to Amperity Support to enable file-based data assets, after which you can use the Amazon S3 data source to load the “address_standardization_conversion.csv” file from the “/householding” directory in that bucket.

To add the address standardization data asset

After adding the address standardization data asset, use all three fields–before, convert, and type–as the primary key.

Add a passthrough table to your customer 360 database named LookupTables AddressStandardization, and then run your customer 360 database to build the LookupTables AddressStandardization table.

Important

The LookupTables AddressStandardization table is used within the Merged Households SQL template in a series of LEFT JOIN operations that are used to standardize addresses. For example:

1LEFT JOIN (
2  SELECT
3    UPPER(before) AS before
4    ,UPPER(convert) AS converted
5  FROM LookupTables_AddressStandardization
6  WHERE type = 'STREET'
7) AS a7clean ON (a7clean.before = core.a7)

You can name this table anything else, such as Address Standardization USA. If you use the Merged Households SQL template, you need to update the LEFT JOIN sections within that template to contain the updated table name.

Add Merged Households table

The Merged Households table applies address-based householding and address standardization to the output of the Merged Customers table and adds a column for household ID. Use the Merged Households table to improve campaigns that send offers to shared physical addresses, such as direct mail campaigns.

The Merged Households table starts as a SQL template that defines how address-based householding works in Amperity.

  1. From the Customer 360 tab, under All Databases, select the menu for the customer 360 database, and then click Edit.

  2. Click Add Table. Name the table “Merged_Households”.

  3. Set Build Mode to “SQL”, and then define a SQL query.

    Tip

    You may download a copy of Merged Households as a template or you may refer to an example.

    Important

    Amperity uses a single table in the customer 360 database to collect rows from the Unified Coalesced table, and then collapses them into a single row per Amperity ID.

    This is referred to as the Merged Customers table. Prior to August 1, 2020 the name of this table was Unified Merged. Verify the name of this table as it is used for your tenant, and then update the template to match the name of the table in your tenant.

  4. The section titled “Basic address standardization” is a common table expression (CTE) that performs address standardization.

    What is address standardization?

    Address standardization is a process that

    • Removes non-alphanumeric characters

    • Trims for leading, trailing, and repeating whitespace

    • Converts characters to uppercase

    • Converts all valid names of states in the United States to their two-character representation

    • Converts all postal codes to five digits

    • Converts common representations of street addresses into standardized variants

    Physical street addresses, as identified by the address field, are standardized by splitting on spaces. Each of the second, third, fourth, etc. elements of an address are compared to a lookup table. When matches are found, they are replaced with standardized values.

  5. The section titled “Build the Household ID” builds a universally unique identifier (UUID) from unique combinations of the address and surname fields.

  6. The section titled “Get count of Amperity ID per Household ID” associates a count of Amperity IDs to each Household ID.

    Tip

    Additional SQL may be added to handle custom statistics on a per-household basis and to support other tenant-specific use cases. The default behavior only associates the Amperity ID to the Household ID, but can be tailored to support most use cases.

    For example, you could add support for checking the number of Amperity IDs associated with a household, and if that exceeds a threshold, that address could be flagged as a business address or some other non-household entity.

  7. The section titled “Create flag for addresses in bad-values blocklist” identifies if addresses have been added to the bad-values blocklist.

    If you are using the bad-values blocklist, uncomment the following sections.

    The blv_address block (lines 209 - 220):

    blv_addresses AS (
      SELECT
        amperity_id,
        CASE
          WHEN (blv.value IS NOT NULL)
          THEN TRUE
          ELSE FALSE
        END AS blv_address
      FROM clean_addresses AS AD
      LEFT JOIN Blocklist_Values AS BLV
        ON AD.address = UPPER(BLV.value)
    )
    

    where Blocklist_Values must be updated to be identical to the name of the domain table that is created by the bad-values blocklist feed.

    Tip

    The blv_addresses block may cause duplicate rows. Use SELECT DISTINCT if you run into issues with duplicate rows.

    Caution

    When uncommenting the blv_addresses block, be sure to add a comma after the closing parentheses (“)”) in the household_stats block or you will get a validation error.

    The blv_address line in the last SELECT statement (line 230):

    ,BL.blv_address
    

    The LEFT JOIN for blv_addresses (lines 248-249):

    LEFT JOIN blv_addresses AS BL
      ON CONCAT_WS(' ',AD.address,AD.address2) = UPPER(BLV.value)
    
  8. The section titled “Build Merged_Households table” combines everything into the Merged Households table.

    Tip

    Extend to support additional use cases, such as for specific household-level statistics or to add filter criteria that checks for BL.amperity_id IS NULL or for ST.amperity_id IS NULL.

  9. Click Validate to verify that the SQL query runs correctly.

  10. Make the table available to the visual Segment Editor by checking the box in the Show in VSE? column.

    Note

    The Merged Households table contains the Amperity ID and should be made available to the Visual Segment Editor.

  11. Click Activate to update the customer 360 database with your changes, and then run the customer 360 database to update the Merged Households table.

Build queries and segments

The default Merged Households table makes available two new columns for segmentation: household_id (the address-based Household ID) and household_size (the number of unique individuals who share the same physical address).

As a SELECT statement, the Merged Household table is similar to:

 1SELECT
 2  amperity_id AS "amperity_id"
 3  ,household_id AS "household_id"
 4  ,household_size AS "household_size"
 5  ,full_address AS "full_address"
 6  ,given_name AS "given_name"
 7  ,surname AS "surname"
 8  ,address AS "address"
 9  ,address2 AS "address2"
10  ,city AS "city"
11  ,state AS "state"
12  ,postal AS "postal"
13FROM Merged_Households

Merged Households template

  1-------------------------------------------------------------------------
  2-- Merged_Households query                                             --
  3--                                                                     --
  4-- This query generates a UUID household_id for records with an exact  --
  5-- match on full_address and surname. It uses Merged_Customers as a    --
  6-- foundation, which means that householding occurs AFTER the best     --
  7-- address has been selected, after which business rules are applied   --
  8-- to define what qualifies as a household.                            --
  9-------------------------------------------------------------------------
 10
 11WITH
 12
 13-------------------------------------------------------------------------
 14-- Basic address standardization                                      --
 15-------------------------------------------------------------------------
 16
 17clean_addresses AS (
 18  SELECT
 19    core.*
 20    ,REGEXP_REPLACE(
 21      REPLACE(
 22        REPLACE(
 23          REPLACE(
 24            REPLACE(
 25              REPLACE(
 26                REPLACE(
 27                  REPLACE(
 28                    REPLACE(
 29                      CONCAT_WS(' ', core.a1,
 30                        COALESCE(a2clean.converted, core.a2),
 31                        COALESCE(a3clean.converted, core.a3),
 32                        COALESCE(a4clean.converted, core.a4),
 33                        COALESCE(a5clean.converted, core.a5),
 34                        core.a6,
 35                        core.a7,
 36                        address2,
 37                        city,
 38                        COALESCE(stateclean.converted, core.state),
 39                        postal)
 40                      ,' APT')
 41                    ,' STE')
 42                  ,' UNIT')
 43                ,' RM')
 44              ,' SPACE')
 45            ,' APARTMENT')
 46          ,' SUITE')
 47        ,' ROOM')
 48      ,' +'
 49    ,''
 50  )
 51  AS full_address
 52  FROM (
 53    SELECT
 54      amperity_id
 55      ,UPPER(given_name) AS given_name
 56      ,UPPER(COALESCE(surname, REVERSE(SPLIT(full_name,' '))[0])) AS surname
 57      ,UPPER(address) AS address
 58      ,REGEXP_REPLACE(UPPER(address2),'[.,\\/#!$%\\^&\\*;:{}=\\-_~()\\.]', '') AS address2
 59      ,UPPER(city) AS city
 60      ,TRIM(UPPER(state)) AS state
 61      ,UPPER(SUBSTR(postal,1,5)) AS postal
 62      ,CASE
 63        WHEN NOT SIZE(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')) >= 1 THEN ''
 64        ELSE REGEXP_REPLACE(UPPER(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')[0]), '[.,\\/#!$%\\^&\\*;:{}=\\-_~()\\. ]', '')
 65      END AS a1
 66      ,CASE
 67        WHEN NOT SIZE(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')) >= 2 THEN ''
 68        ELSE REGEXP_REPLACE(UPPER(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')[1]), '[.,\\/#!$%\\^&\\*;:{}=\\-_~()\\. ]', '')
 69      END AS a2
 70      ,CASE
 71        WHEN NOT SIZE(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')) >= 3 THEN ''
 72        ELSE REGEXP_REPLACE(UPPER(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')[2]), '[.,\\/#!$%\\^&\\*;:{}=\\-_~()\\. ]', '')
 73      END AS a3
 74      ,CASE
 75        WHEN NOT SIZE(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')) >= 4 THEN ''
 76        ELSE REGEXP_REPLACE(UPPER(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')[3]), '[.,\\/#!$%\\^&\\*;:{}=\\-_~()\\. ]', '')
 77      END AS a4
 78      ,CASE
 79        WHEN NOT SIZE(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')) >= 5 THEN ''
 80        ELSE REGEXP_REPLACE(UPPER(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')[4]), '[.,\\/#!$%\\^&\\*;:{}=\\-_~()\\. ]', '')
 81      END AS a5
 82      ,CASE
 83        WHEN NOT SIZE(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')) >= 6 THEN ''
 84        ELSE REGEXP_REPLACE(UPPER(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')[5]), '[.,\\/#!$%\\^&\\*;:{}=\\-_~()\\. ]', '')
 85      END AS a6
 86      ,CASE
 87        WHEN NOT SIZE(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')) >= 7 THEN ''
 88        ELSE REGEXP_REPLACE(UPPER(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')[6]), '[.,\\/#!$%\\^&\\*;:{}=\\-_~()\\. ]', '')
 89      END AS a7
 90
 91    FROM Merged_Customers
 92    WHERE address IS NOT NULL
 93    AND city IS NOT NULL
 94    AND state IS NOT NULL
 95    AND postal IS NOT NULL
 96    AND COALESCE(surname, REVERSE(SPLIT(full_name,' '))[0]) IS NOT NULL
 97  ) AS core
 98
 99  LEFT JOIN (
100    SELECT
101      UPPER(before) AS before
102      ,UPPER(convert) AS converted
103    FROM LookupTables_AddressStandardization
104    WHERE type = 'STREET'
105  ) AS a1clean ON (a1clean.before = core.a1)
106
107  LEFT JOIN (
108    SELECT
109      UPPER(before) AS before
110      ,UPPER(convert) AS converted
111    FROM LookupTables_AddressStandardization
112    WHERE type = 'STREET'
113  ) AS a2clean ON (a2clean.before = core.a2)
114
115  LEFT JOIN (
116    SELECT
117      UPPER(before) AS before
118      ,UPPER(convert) AS converted
119    FROM LookupTables_AddressStandardization
120    WHERE type = 'STREET'
121  ) AS a3clean ON (a3clean.before = core.a3)
122
123  LEFT JOIN (
124    SELECT
125      UPPER(before) AS before
126      ,UPPER(convert) AS converted
127    FROM LookupTables_AddressStandardization
128    WHERE type = 'STREET'
129  ) AS a4clean ON (a4clean.before = core.a4)
130
131  LEFT JOIN (
132    SELECT
133      UPPER(before) AS before
134      ,UPPER(convert) AS converted
135    FROM LookupTables_AddressStandardization
136    WHERE type = 'STREET'
137  ) AS a5clean ON (a5clean.before = core.a5)
138
139  LEFT JOIN (
140    SELECT
141      UPPER(before) AS before
142      ,UPPER(convert) AS converted
143    FROM LookupTables_AddressStandardization
144    WHERE type = 'STREET'
145  ) AS a6clean ON (a6clean.before = core.a6)
146
147  LEFT JOIN (
148    SELECT
149      UPPER(before) AS before
150      ,UPPER(convert) AS converted
151    FROM LookupTables_AddressStandardization
152    WHERE type = 'STREET'
153  ) AS a7clean ON (a7clean.before = core.a7)
154
155  LEFT JOIN (
156    SELECT
157      UPPER(before) AS before
158      ,UPPER(convert) AS converted
159    FROM LookupTables_AddressStandardization
160    WHERE type = 'STATE'
161  ) AS stateclean ON (stateclean.before = core.state)
162
163),
164
165-------------------------------------------------------------------------
166-- Build the Household ID from full_address + surname as a UUID        --
167-------------------------------------------------------------------------
168
169uuid_ids AS (
170  SELECT
171    amperity_id
172    ,CONCAT_WS(
173      '-'
174      ,substr(household_id, 1,  8)
175      ,substr(household_id, 9,  4)
176      ,substr(household_id, 13, 4)
177      ,substr(household_id, 17, 4)
178      ,substr(household_id, 21, 12)) AS household_id
179  FROM (
180    SELECT
181      amperity_id
182      ,SHA(CONCAT(full_address, surname)) AS household_id
183    FROM clean_addresses
184  )
185),
186
187-------------------------------------------------------------------------
188-- Get count of Amperity ID per Household ID                           --
189-- For use with downstream filters                                     --
190-------------------------------------------------------------------------
191
192household_stats AS (
193  SELECT
194    household_id
195    ,COUNT(DISTINCT amperity_id) AS amperity_id_count
196  FROM uuid_ids
197  WHERE household_id IS NOT NULL
198  GROUP BY 1
199),
200
201-------------------------------------------------------------------------
202-- Create flag for addresses in bad-values blocklist                   --
203-- Blocklist_Values is the name of a domain table. Verify then update  --
204-- For use with downstream filters                                     --
205-------------------------------------------------------------------------
206
207blv_addresses AS (
208  SELECT
209    amperity_id,
210    CASE
211      WHEN (blv.value IS NOT NULL)
212      THEN TRUE
213      ELSE FALSE
214    END AS blv_address
215  FROM clean_addresses AS AD
216  LEFT JOIN Blocklist_Values AS BLV
217    ON CONCAT_WS(' ',AD.address,AD.address2) = UPPER(BLV.value)
218)
219
220-------------------------------------------------------------------------
221-- Build Merged_Households table                                       --
222-------------------------------------------------------------------------
223
224SELECT distinct
225  AD.amperity_id
226  ,ID.household_id
227  ,ST.amperity_id_count AS household_size
228  ,BL.blv_address
229  ,AD.full_address
230  ,AD.given_name
231  ,AD.surname
232  ,AD.address
233  ,AD.address2
234  ,AD.city
235  ,AD.state
236  ,AD.postal
237
238FROM clean_addresses AS AD
239
240LEFT JOIN uuid_ids AS ID
241  ON AD.amperity_id = ID.amperity_id
242
243LEFT JOIN household_stats AS ST
244  ON ID.household_id = ST.household_id
245
246LEFT JOIN blv_addresses AS BL
247  ON AD.amperity_id = BL.amperity_id