Gender prediction¶
Gender prediction can be a helpful step when applying personalization to marketing campaigns, within email messages, and during visitor interactions on websites. When gender is known, it can be used as a signal for tailoring communications, recommendations, and product lists based on observed preferences that are common for that gender.
When used carefully, gender prediction can have low downside risk due to false positives. That said, gender prediction should not be used for 1:1 personalization or to predict pronouns (he, him, she, her, they, them) because the benefits of correctly predicting gender is, in most cases, outweighed by the high downside risks of being wrong.
It’s important that your brand understands how gender prediction will be used prior to enabling it within your tenant. Work closely with partners to ensure this generated data is used responsibly.
Add the data asset¶
Gender prediction must be enabled for your tenant. Amperity provides the gender_name_ratios.csv file as a data asset from an Amazon S3 bucket named “Amperity Data Assets”.
Create a support ticket and request to enable this Amazon S3 bucket for your tenant, after which you may use a courier to pull the gender_name_ratios.csv file to your tenant using an Amazon S3 data source.
About gender_name_ratios.csv
Gender prediction based on the gender_name_ratios.csv file (an Amperity data asset) should only be used for audiences that exist within the United States.
The gender_name_ratios.csv file contains a list of baby names from the past ~130 years, along with their associated gender.
The source of the data in the gender_name_ratios.csv file is from United States Social Security Administration records for popularity and frequency of baby names . These records were used to generate the gender_name_ratios.csv file, which is similar to:
given_name,predicted_gender,gender_name_ratio,male_count,female_count
EMILIA,F,7178.6,5,35893
THERESE,F,7025.0,5,35125
AILEEN,F,6969.8,5,34849
...
LINDSEY,F,20.2,7710,156111
MORRISON,M,20.2,1496,74
ROLLA,M,20.1,1306,65
The most important column is gender_name_ratio, which describes what proportion of given_name is associated with one gender versus the other.
Note
Only names with a ratio greater than 20 are included. This ensures that any prediction has a ~95% chance of being accurate based on the given name.
Only names with at least 1000 male or female examples were included. This filters out very uncommon names.
To add the gender_name_ratios.csv data asset
Add the gender name ratios data asset to your tenant by pulling the file that is available from Amperity Data Assets, which is the name of an Amazon S3 bucket that can be made available to your tenant. Follow the steps for adding a data source and feed. Click Browse and select the “gender_name_ratios.csv” file from the Amperity Data Assets Amazon S3 bucket.
Use given_name as the primary key.
Note
If Amperity data assets credentials are not available on your tenant, make a request to Amperity Support to enable Amperity data assets for your tenant.
You can add predicted gender to your customer 360 database in two ways, depending on how your brand wants to use predicted gender to build segments:
Extend the Customer 360 and/or Merged Customers tables to include predicted gender (recommended).
Add predicted gender values to your customer 360 database as a standalone table.
Extend the Merged_Customers table (recommended)
Note
The steps are the same for both the Customer 360 and Merged Customers tables.
Edit the Merged Customers table and extend the table for predicted gender.
Use a common table expression (CTE) to pull data from the domain table that contains predicted gender data (“Predictions_Gender”):
predict_gender AS (
SELECT
mc.amperity_id
,CASE
WHEN UPPER(ratios.predicted_gender) = 'M' THEN 'Male'
WHEN UPPER(ratios.predicted_gender) = 'F' THEN 'Female'
ELSE ratios.predicted_gender
END AS predicted_gender
FROM Merged_Customers AS mc
LEFT JOIN Predictions_Gender AS ratios
ON UPPER(
COALESCE(mc.given_name, SPLIT(mc.full_name,' ')[0])
) = ratios.given_name
),
Update the list of columns in the Merged Customers table to include predicted gender and combined gender:
,pg.predicted_gender
,COALESCE(mc.gender,pg.predicted_gender) AS combined_gender
Note
The combined gender column uses the value from the gender column in the Merged Customers table when a value exists, and then uses the value from the predicted_gender column if the gender column in the Merged Customers table is empty.
Use a LEFT JOIN to join the values from the common table expression to the Merged Customers table:
LEFT JOIN predict_gender pg ON pg.amperity_id = mc.amperity_id
Add a table for predicted gender (optional)
Your brand’s use cases for predicted gender may prefer using a standalone table.
Add a passthrough table to your customer 360 database named Gender Name Ratios.
Add a SQL table to your customer 360 database named Predicted Gender.
Choose SQL as the build mode, and then use SQL similar to:
WITH ratios AS ( SELECT * FROM Gender_Name_Ratios WHERE gender_name_ratio >= 100 ) SELECT mc.amperity_id ,ratios.predicted_gender FROM Merged_Customers AS mc LEFT JOIN Gender_Name_Ratios AS ratios ON UPPER( COALESCE(mc.given_name, SPLIT(mc.full_name,' ')[0]) ) = ratios.given_name
where “100” represents a 99% accuracy threshold. Increase or decrease this value as necessary.
Tip
This table will be unique by Amperity ID and may be made available to the Segment Editor for use with campaigns.
Adjust accuracy threshold¶
The default accuracy threshold for gender prediction is ~95%. This means that for any given name it has a 20:1 likelihood of being associated with a specific gender. If greater accuracy is required for a use case, add a custom gender_name_ratio threshold to the query:
WITH ratios AS (
SELECT *
FROM Gender_Name_Ratios
WHERE gender_name_ratio >= 100
)
SELECT
mc.amperity_id
,ratios.predicted_gender
FROM Merged_Customers AS mc
LEFT JOIN Gender_Name_Ratios AS ratios
ON UPPER(
COALESCE(mc.given_name, SPLIT(mc.full_name,' ')[0])
) = ratios.given_name
where “100” represents a 99% accuracy threshold for gender prediction.