Address-based Householding¶
A household is a group of people who share a physical address and possibly other attributes. For example: a shared device or network, a shared last name, a shared phone number. Householding is a process that identifies a unique household in a data set.
Address-based householding is a process that standardizes addresses, pairs them with a last name, and then assigns that pair a Household ID.
A Household ID is a universally unique identifier (UUID) that uniquely identifies the combination of a standardized address and a last name.
Note
This topic describes address-based householding as a starting point. An implementation of address-based householding can vary significantly from tenant to tenant, depending on the
Type and number of data sources
Number of addresses
Types of direct mail campaigns that will be based on the results
along with other dependencies that may be tenant-specific.
Please discuss address-based householding with your Amperity representative prior to adding it so they can help the scope and level of detail that may be required for your tenant.
How it works¶
Address-based householding is built upon the results of the Merged_Customers table. This enables address-based householding the ability to use the results of individual identity resolution that was performed by Amperity (and built into the Merged_Customers table). Apply the results of address-based householding to campaigns that rely on physical addresses, such as direct mail campaigns, to ensure that a single household gets a single piece of direct mail, even when multiple unique individuals reside within the same household.
Tip
Address-based householding can be improved when used alongside standardization provided by national change of address (NCOA) and coding accuracy support system (CASS).
National Change of Address (NCOA) is a secure dataset of approximately 160 million permanent change-of-address (COA) records consisting of the names and addresses of individuals, families, and businesses who have filed a change-of-address with the United States Postal Service (USPS).
Coding accuracy support system (CASS) is an address standardization concept that helps clean address to make them more effective for direct mail campaigns.
The process for enabling address-based householding includes:
Reviewing the Merged_Customers table to identify any tenant-specific behaviors within the table that may need to be considered when extending Amperity for address-based householding.
Reviewing the bad-values blocklist to identify any tenant-specific behaviors within the blocklist workflow and to identify the name of the domain table associated with the bad-values blocklist feed.
Adding a Merged_Households table to the customer 360 database
Building segments that use the Household ID, which is a UUID that is available to segments from the Merged_Households table.
Sending segment results to any downstream process.
Tip
Additional use cases for address-based householding include:
Joining the Merged_Households table to the Customer_360 table to add the
household_id
andhousehold_size
fields. This makes them available as profile attributes.Using a common table expression (CTE) to flag a single individual that is associated with an address as the primary contact.
Adding a Household_360 table. This is similar to the Customer_360 table and can merge values down to a single row per Household ID. This enables the use of summary attributes, such as identifying household lifetime value.
Add feed for address standardization¶
Address standardization starts as a feed that loads a CSV file that contains a list of address variations for state and street names. Download a copy of the address-standardization.csv file to your local computer, and then upload that file to Amperity when adding the feed.
To add a feed for address standardization
From the Sources tab, click Add Feed. This opens the Add Feed dialog box.
Under Data Source select Create new source, and then enter the name for the source: LookupTables.
Use AddressStandardization as the feed name.
Under Sample File select Upload new file, and then choose the
address-standardization.csv
file.Click Continue. This opens the Feed Editor page.
Note
The
address-standardization.csv
file does not contain any PII and should not be made available to Stitch. Set thetype
andbefore
fields as the primary key.Click Activate.
Note
The
address-standardization.csv
file is static and is loaded to Amperity one time using this feed. This will create a domain table that is used to support all downstream address-based householding processes. If, for any reason, theaddress-standardization.csv
file does require an update, use the Load new data option for the feed to reload the updated CSV file to the same feed.
Add Merged_Households table¶
The Merged_Households table is a data table that must be added to a customer 360 database to support address-based householding. The purpose of the Merged_Households table is to apply address-based householding and address standardization to the output of the Merged_Customers table. The contents of this table contain the Household ID and can be used to improve the results of certain campaign types, such as direct mail.
This section walks through the default SQL template that is used to define how address-based householding works in Amperity.
From the Customer 360 tab, under All Databases, select the menu for the customer 360 database, and then click Edit.
Click Add Table. Name the table
Merged_Households
.Set Build Mode to “SQL”, and then define a SQL query.
Tip
You may download a copy of Merged_Households as a template or you may refer to the example at the end of this topic.
Important
Amperity uses a single table in the customer 360 database to collect rows from the Unified_Coalesced table, and then collapses them into a single row per Amperity ID. This is referred to as the Merged_Customers table. Prior to August 1, 2020 the name of this table was Unified_Merged. Verify the name of this table as it is used for your tenant, and then update the template described in this topic so that it matches the name of the table in your tenant.
The section titled “Basic address standardization” is a common table expression (CTE) that performs address standardization. This process removes non-alphanumeric characters, trims for leading, trailing, and repeating whitespace, converts characters to uppercase, converts all valid names of states in the United States to their two-character representation, converts all postal codes to five digits, and converts common representations of street addresses into standardized variants.
Physical street addresses (as identified by the
address
field) are standardized by splitting on spaces. Each of the second, third, fourth, etc. elements of an address are compared to a lookup table. When matches are found, they are replaced with standardized values.The section titled “Build the Household ID …” builds a universally unique identifier (UUID) from unique combinations of the address and surname fields.
The section titled “Get count of Amperity ID per Household ID” associates a count of Amperity IDs to each Household ID.
Tip
This section is where additional SQL is added to handle custom statistics on a per-household basis and to support other tenant-specific use cases. The default behavior only associates the Amperity ID to the Household ID, but can be tailored to support most use cases.
For example, you could add support for checking the number of Amperity IDs associated with a household, and if that exceeds a threshold, that address could be flagged as a business address (or some other non-household entity).
The section titled “Create flag for addresses in bad-values blocklist” identifies if addresses have been added to the bad-values blocklist.
If you are using the bad-values blocklist, uncomment the following sections.
The
blv_address
block (lines 209 - 220):blv_addresses AS ( SELECT amperity_id, CASE WHEN (blv.value IS NOT NULL) THEN TRUE ELSE FALSE END AS blv_address FROM clean_addresses AS AD LEFT JOIN Blocklist_Values AS BLV ON AD.address = UPPER(BLV.value) )
where
Blocklist_Values
must be updated to be identical to the name of the domain table that is created by the bad-values blocklist feed.Tip
The
blv_addresses
block may cause duplicate rows. UseSELECT DISTINCT
if you run into issues with duplicate rows.Caution
When uncommenting the
blv_addresses
block, be sure to add a comma after the closing parentheses ()
) in thehousehold_stats
block or you will get a validation error.The
blv_address
line in the lastSELECT
statement (line 230):,BL.blv_address
The
LEFT JOIN
forblv_addresses
(lines 248-249):LEFT JOIN blv_addresses AS BL ON CONCAT_WS(' ',AD.address,AD.address2) = UPPER(BLV.value)
The section titled “Build Merged_Households table” combines everything into the Merged_Households table.
Tip
Extend this section to support additional use cases, such as for specific household-level statistics or to add filter criteria that checks for
BL.amperity_id IS NULL
or forST.amperity_id IS NULL
.Click Validate to verify that the SQL query runs correctly.
Make the table available to the Visual Segment Editor by checking the box in the Show in VSE? column.
Note
The Merged_Households table contains the Amperity ID and should be made available to the Visual Segment Editor.
Click Activate to update the customer 360 database with your changes, and then run the customer 360 database to update the Merged_Households table.
Build Segments¶
The default Merged_Households table (as described in this topic) makes available two new columns for segmentation: household_id
(the address-based Household ID) and household_size
(the number of unique individuals who share the same physical address).
As a SELECT statement, the Merged_Household table is similar to:
SELECT
amperity_id AS "amperity_id"
,household_id AS "household_id"
,household_size AS "household_size"
,full_address AS "full_address"
,given_name AS "given_name"
,surname AS "surname"
,address AS "address"
,address2 AS "address2"
,city AS "city"
,state AS "state"
,postal AS "postal"
FROM
Merged_Households
Merged_Households template¶
-------------------------------------------------------------------------
-- Merged_Households query --
-- --
-- This query generates a UUID household_id for records with an exact --
-- match on full_address and surname. It uses Merged_Customers as a --
-- foundation, which means that householding occurs AFTER the best --
-- address has been selected, after which business rules are applied --
-- to define what qualifies as a household. --
-------------------------------------------------------------------------
WITH
-------------------------------------------------------------------------
-- Basic address standardization --
-------------------------------------------------------------------------
clean_addresses AS (
SELECT
core.*
,REGEXP_REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
REPLACE(
CONCAT_WS(' ', core.a1,
COALESCE(a2clean.converted, core.a2),
COALESCE(a3clean.converted, core.a3),
COALESCE(a4clean.converted, core.a4),
COALESCE(a5clean.converted, core.a5),
core.a6,
core.a7,
address2,
city,
COALESCE(stateclean.converted, core.state),
postal)
,' APT')
,' STE')
,' UNIT')
,' RM')
,' SPACE')
,' APARTMENT')
,' SUITE')
,' ROOM')
,' +'
,''
)
AS full_address
FROM (
SELECT
amperity_id
,UPPER(given_name) AS given_name
,UPPER(COALESCE(surname, REVERSE(SPLIT(full_name,' '))[0])) AS surname
,UPPER(address) AS address
,REGEXP_REPLACE(UPPER(address2),'[.,\\/#!$%\\^&\\*;:{}=\\-_`~()\\.]', '') AS address2
,UPPER(city) AS city
,TRIM(UPPER(state)) AS state
,UPPER(SUBSTR(postal,1,5)) AS postal
,CASE
WHEN NOT SIZE(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')) >= 1 THEN ''
ELSE REGEXP_REPLACE(UPPER(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')[0]), '[.,\\/#!$%\\^&\\*;:{}=\\-_`~()\\. ]', '')
END AS a1
,CASE
WHEN NOT SIZE(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')) >= 2 THEN ''
ELSE REGEXP_REPLACE(UPPER(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')[1]), '[.,\\/#!$%\\^&\\*;:{}=\\-_`~()\\. ]', '')
END AS a2
,CASE
WHEN NOT SIZE(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')) >= 3 THEN ''
ELSE REGEXP_REPLACE(UPPER(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')[2]), '[.,\\/#!$%\\^&\\*;:{}=\\-_`~()\\. ]', '')
END AS a3
,CASE
WHEN NOT SIZE(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')) >= 4 THEN ''
ELSE REGEXP_REPLACE(UPPER(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')[3]), '[.,\\/#!$%\\^&\\*;:{}=\\-_`~()\\. ]', '')
END AS a4
,CASE
WHEN NOT SIZE(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')) >= 5 THEN ''
ELSE REGEXP_REPLACE(UPPER(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')[4]), '[.,\\/#!$%\\^&\\*;:{}=\\-_`~()\\. ]', '')
END AS a5
,CASE
WHEN NOT SIZE(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')) >= 6 THEN ''
ELSE REGEXP_REPLACE(UPPER(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')[5]), '[.,\\/#!$%\\^&\\*;:{}=\\-_`~()\\. ]', '')
END AS a6
,CASE
WHEN NOT SIZE(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')) >= 7 THEN ''
ELSE REGEXP_REPLACE(UPPER(SPLIT(REGEXP_REPLACE(address,' {2,}',' '), ' ')[6]), '[.,\\/#!$%\\^&\\*;:{}=\\-_`~()\\. ]', '')
END AS a7
FROM
Merged_Customers
WHERE address IS NOT NULL
AND city IS NOT NULL
AND state IS NOT NULL
AND postal IS NOT NULL
AND COALESCE(surname, REVERSE(SPLIT(full_name,' '))[0]) IS NOT NULL
) AS core
LEFT JOIN (
SELECT
UPPER(before) AS before
,UPPER(convert) AS converted
FROM LookupTables_AddressStandardization
WHERE type = 'STREET'
) AS a1clean ON (a1clean.before = core.a1)
LEFT JOIN (
SELECT
UPPER(before) AS before
,UPPER(convert) AS converted
FROM LookupTables_AddressStandardization
WHERE type = 'STREET'
) AS a2clean ON (a2clean.before = core.a2)
LEFT JOIN (
SELECT
UPPER(before) AS before
,UPPER(convert) AS converted
FROM LookupTables_AddressStandardization
WHERE type = 'STREET'
) AS a3clean ON (a3clean.before = core.a3)
LEFT JOIN (
SELECT
UPPER(before) AS before
,UPPER(convert) AS converted
FROM LookupTables_AddressStandardization
WHERE type = 'STREET'
) AS a4clean ON (a4clean.before = core.a4)
LEFT JOIN (
SELECT
UPPER(before) AS before
,UPPER(convert) AS converted
FROM LookupTables_AddressStandardization
WHERE type = 'STREET'
) AS a5clean ON (a5clean.before = core.a5)
LEFT JOIN (
SELECT
UPPER(before) AS before
,UPPER(convert) AS converted
FROM LookupTables_AddressStandardization
WHERE type = 'STREET'
) AS a6clean ON (a6clean.before = core.a6)
LEFT JOIN (
SELECT
UPPER(before) AS before
,UPPER(convert) AS converted
FROM LookupTables_AddressStandardization
WHERE type = 'STREET'
) AS a7clean ON (a7clean.before = core.a7)
LEFT JOIN (
SELECT
UPPER(before) AS before
,UPPER(convert) AS converted
FROM LookupTables_AddressStandardization
WHERE type = 'STATE'
) AS stateclean ON (stateclean.before = core.state)
),
-------------------------------------------------------------------------
-- Build the Household ID from full_address + surname as a UUID --
-------------------------------------------------------------------------
uuid_ids AS (
SELECT
amperity_id
,CONCAT_WS(
'-'
,substr(household_id, 1, 8)
,substr(household_id, 9, 4)
,substr(household_id, 13, 4)
,substr(household_id, 17, 4)
,substr(household_id, 21, 12)) AS household_id
FROM (
SELECT
amperity_id
,SHA(CONCAT(full_address, surname)) AS household_id
FROM
clean_addresses
)
),
-------------------------------------------------------------------------
-- Get count of Amperity ID per Household ID --
-- For use with downstream filters --
-------------------------------------------------------------------------
household_stats AS (
SELECT
household_id
,COUNT(DISTINCT amperity_id) AS amperity_id_count
FROM uuid_ids
WHERE household_id IS NOT NULL
GROUP BY 1
),
-------------------------------------------------------------------------
-- Create flag for addresses in bad-values blocklist --
-- Blocklist_Values is the name of a domain table; verify then update --
-- For use with downstream filters --
-------------------------------------------------------------------------
blv_addresses AS (
SELECT
amperity_id,
CASE
WHEN (blv.value IS NOT NULL)
THEN TRUE
ELSE FALSE
END AS blv_address
FROM clean_addresses AS AD
LEFT JOIN Blocklist_Values AS BLV
ON CONCAT_WS(' ',AD.address,AD.address2) = UPPER(BLV.value)
)
-------------------------------------------------------------------------
-- Build Merged_Households table --
-------------------------------------------------------------------------
SELECT distinct
AD.amperity_id
,ID.household_id
,ST.amperity_id_count AS household_size
,BL.blv_address
,AD.full_address
,AD.given_name
,AD.surname
,AD.address
,AD.address2
,AD.city
,AD.state
,AD.postal
FROM clean_addresses AS AD
LEFT JOIN uuid_ids AS ID
ON AD.amperity_id = ID.amperity_id
LEFT JOIN household_stats AS ST
ON ID.household_id = ST.household_id
LEFT JOIN blv_addresses AS BL
ON AD.amperity_id = BL.amperity_id