Gender prediction

Gender prediction can be a helpful step when applying personalization to marketing campaigns, within email messages, and during visitor interactions on websites. When gender is known, it can be used as a signal for tailoring communications, recommendations, and product lists based on observed preferences that are common for that gender.

When used carefully, gender prediction can have low downside risk due to false positives. That said, gender prediction should not be used for 1:1 personalization or to predict pronouns (he, him, she, her, they, them) because the benefits of correctly predicting gender is, in most cases, outweighed by the high downside risks of being wrong.

It’s important that your brand understands how gender prediction will be used prior to enabling it within your tenant. Work closely with partners to ensure this generated data is used responsibly.

Add the data asset

Gender prediction must be enabled for your tenant. Amperity provides the gender_name_ratios.csv file as a data asset from an Amazon S3 bucket named “Amperity Data Assets”.

Create a support ticket and request to enable this Amazon S3 bucket for your tenant, after which you may use a courier to pull the gender_name_ratios.csv file to your tenant using an Amazon S3 data source.

About gender_name_ratios.csv

Gender prediction based on the gender_name_ratios.csv file (an Amperity data asset) should only be used for audiences that exist within the United States.

The gender_name_ratios.csv file contains a list of baby names from the past ~130 years, along with their associated gender.

The source of the data in the gender_name_ratios.csv file is from United States Social Security Administration records for popularity and frequency of baby names. These records were used to generate the gender_name_ratios.csv file, which is similar to:

given_name,predicted_gender,gender_name_ratio,male_count,female_count
EMILIA,F,7178.6,5,35893
THERESE,F,7025.0,5,35125
AILEEN,F,6969.8,5,34849
...
LINDSEY,F,20.2,7710,156111
MORRISON,M,20.2,1496,74
ROLLA,M,20.1,1306,65

The most important column is gender_name_ratio, which describes what proportion of given_name is associated with one gender versus the other.

Note

Only names with a ratio greater than 20 are included. This ensures that any prediction has a ~95% chance of being accurate based on the given name.

Only names with at least 1000 male or female examples were included. This filters out very uncommon names.

To add the gender_name_ratios.csv data asset

Add the gender name ratios data asset to your tenant by pulling the file that is available from Amperity Data Assets, which is the name of an Amazon S3 bucket that can be made available to your tenant. Follow the steps for adding a data source and feed. Click Browse and select the “gender_name_ratios.csv” file from the Amperity Data Assets Amazon S3 bucket.

Use given_name as the primary key.

Note

If Amperity data assets credentials are not available on your tenant, make a request to Amperity Support to enable Amperity data assets for your tenant.

You can add predicted gender to your customer 360 database in two ways, depending on how your brand wants to use predicted gender to build segments:

  1. Extend the Customer 360 and/or Merged Customers tables to include predicted gender (recommended).

  2. Add predicted gender values to your customer 360 database as a standalone table.

Extend the Merged_Customers table (recommended)

Note

The steps are the same for both the Customer 360 and Merged Customers tables.

Edit the Merged Customers table and extend the table for predicted gender.

Use a common table expression (CTE) to pull data from the domain table that contains predicted gender data (“Predictions_Gender”):

predict_gender AS (
  SELECT
    mc.amperity_id
    ,CASE
      WHEN UPPER(ratios.predicted_gender) = 'M' THEN 'Male'
      WHEN UPPER(ratios.predicted_gender) = 'F' THEN 'Female'
      ELSE ratios.predicted_gender
    END AS predicted_gender
  FROM Merged_Customers AS mc
  LEFT JOIN Predictions_Gender AS ratios
  ON UPPER(
    COALESCE(mc.given_name, SPLIT(mc.full_name,' ')[0])
  ) = ratios.given_name
),

Update the list of columns in the Merged Customers table to include predicted gender and combined gender:

,pg.predicted_gender
,COALESCE(mc.gender,pg.predicted_gender) AS combined_gender

Note

The combined gender column uses the value from the gender column in the Merged Customers table when a value exists, and then uses the value from the predicted_gender column if the gender column in the Merged Customers table is empty.

Use a LEFT JOIN to join the values from the common table expression to the Merged Customers table:

LEFT JOIN predict_gender pg ON pg.amperity_id = mc.amperity_id

Add a table for predicted gender (optional)

Your brand’s use cases for predicted gender may prefer using a standalone table.

  1. Add a passthrough table to your customer 360 database named Gender Name Ratios.

  2. Add a SQL table to your customer 360 database named Predicted Gender.

  3. Choose SQL as the build mode, and then use SQL similar to:

    WITH ratios AS (
      SELECT *
      FROM Gender_Name_Ratios
      WHERE gender_name_ratio >= 100
    )
    
    SELECT
      mc.amperity_id
      ,ratios.predicted_gender
    FROM Merged_Customers AS mc
    LEFT JOIN Gender_Name_Ratios AS ratios
    ON UPPER(
      COALESCE(mc.given_name, SPLIT(mc.full_name,' ')[0])
    ) = ratios.given_name
    

    where “100” represents a 99% accuracy threshold. Increase or decrease this value as necessary.

    Tip

    This table will be unique by Amperity ID and may be made available to the Segment Editor for use with campaigns.

Adjust accuracy threshold

The default accuracy threshold for gender prediction is ~95%. This means that for any given name it has a 20:1 likelihood of being associated with a specific gender. If greater accuracy is required for a use case, add a custom gender_name_ratio threshold to the query:

WITH ratios AS (
  SELECT *
  FROM Gender_Name_Ratios
  WHERE gender_name_ratio >= 100
)

SELECT
  mc.amperity_id
  ,ratios.predicted_gender
FROM Merged_Customers AS mc
LEFT JOIN Gender_Name_Ratios AS ratios
ON UPPER(
  COALESCE(mc.given_name, SPLIT(mc.full_name,' ')[0])
) = ratios.given_name

where “100” represents a 99% accuracy threshold for gender prediction.