Gender Prediction


In many cases, product affinity or other behaviors can provide greater insight into personalization than simply relying on names. However, improperly using gender prediction can create a negative customer experience for any brand. It’s important to understand how this attribute will be used prior to adding it to your tenant and to work closely with partners to ensure this generated data is used responsibly.

Gender prediction can be a helpful part of the effort to apply personalization to marketing campaigns, email lists, and websites. When gender is known, it can be used as a signal for tailoring communications, recommendations, and product lists based on observed preferences that are common to people within that gender.


Gender prediction uses 130 years of United States census data and is only available for adding personalization to marketing campaigns run in the United States.

Gender prediction must be configured for use in Amperity. Ask your DataGrid Operator or your Amperity representative (via the Amperity Support Portal or send email to for help with configuring the gender prediction capabilities in your tenant.


When used carefully, gender prediction can have a low downside risk due to false positives. However, gender prediction should not be used for 1:1 personalization, especially for the purpose of predicting pronouns (he, him, she, her, they, them) because the benefits of correctly predicting gender is, in most cases, outweighed by the high downside risks of being wrong.

Configure gender prediction

Gender prediction is not automatically output by Stitch, however this functionality can be added by leveraging existing data tagged with the given-name semantic. First add a feed that contains data usable for predicting gender, and then update the customer 360 database to use SQL to associate predictions to an Amperity ID.


Use a sandbox to validate the results of gender prediction within the sandbox before promoting it to your tenant.

To configure Amperity for gender prediction

  1. Download the gender_name_ratios.csv file.

  2. In a sandbox, add a feed named “Gender” with a new source named “Predictions”. Upload the gender_name_ratios.csv file.

  3. Assign the primary key to the given_name column. (Do not make this table available to Stitch or apply any semantic tags to fields.)

  4. Activate the feed.

  5. Run Stitch.

  6. From the Customer 360 tab, edit the customer 360 database.

  7. Click Add Table, and then name the table Predictions_Gender.

  8. Choose SQL as the build mode. Add the following SELECT statement:

    FROM Merged_Customers AS mc
    LEFT JOIN Predictions_Gender AS ratios
    ON UPPER(COALESCE(mc.given_name, SPLIT(mc.full_name,' ')[0])) = ratios.given_name

Accuracy threshold

The default accuracy threshold for gender prediction is ~95%. This means that for any given name it has a 20:1 likelihood of being associated with a specific gender. If greater accuracy is required for a use case, add a custom threshold to the query:

WITH ratios AS (SELECT * FROM Predictions_Gender WHERE gender_name_ratio >= 100)

FROM Merged_Customers AS mc
LEFT JOIN Predictions_Gender AS ratios
ON UPPER(COALESCE(mc.given_name, SPLIT(mc.full_name,' ')[0])) = ratios.given_name

where 100 represents a 99% accuracy threshold for gender prediction.

Gender name ratios

The source of the data in the gender_name_ratios.csv file is from United States Social Security Administration records for the popularity and frequency of baby names between 1880-2018 <>.

These records describe more than 351 million baby names, along with their associated gender. These records were used to generate the gender_name_ratios.csv file, which is similar to:


The most important column is gender_name_ratio, which describes what proportion of given_name is associated with one gender versus the other.

The following filters were applied to this data set prior to generation:

  1. Only names with a gender name ratio greater than 20 were included. This ensures that any prediction has a ~95% chance of being correct based on the given name.

  2. Only names with at least 1000 male or female examples were included, which filters out very uncommon names.