Gender prediction

Gender prediction can be a helpful step when applying personalization to marketing campaigns, within email messages, and during visitor interactions on websites. When gender is known, it can be used as a signal for tailoring communications, recommendations, and product lists based on observed preferences that are common for that gender.

When used carefully, gender prediction can have low downside risk due to false positives. That said, gender prediction should not be used for 1:1 personalization or to predict pronouns (he, him, she, her, they, them) because the benefits of correctly predicting gender is, in most cases, outweighed by the high downside risks of being wrong.

It’s important that your brand understands how gender prediction will be used prior to enabling it within your tenant. Work closely with partners to ensure this generated data is used responsibly.

Add the data asset

Gender prediction must be enabled for your tenant. Amperity provides the gender_name_ratios.csv file as a data asset from an Amazon S3 bucket named “Amperity Data Assets”.

Create a support ticket and request to enable this Amazon S3 bucket for your tenant, after which you may use a courier to pull the gender_name_ratios.csv file to your tenant using an Amazon S3 data source.

About gender_name_ratios.csv

Gender prediction based on the gender_name_ratios.csv file (an Amperity data asset) should only be used for audiences that exist within the United States.

The gender_name_ratios.csv file contains a list of baby names from the past ~130 years, along with their associated gender.

The source of the data in the gender_name_ratios.csv file is from United States Social Security Administration records for popularity and frequency of baby names . These records were used to generate the gender_name_ratios.csv file, which is similar to:

given_name,predicted_gender,gender_name_ratio,male_count,female_count
EMILIA,F,7178.6,5,35893
THERESE,F,7025.0,5,35125
AILEEN,F,6969.8,5,34849
...
LINDSEY,F,20.2,7710,156111
MORRISON,M,20.2,1496,74
ROLLA,M,20.1,1306,65

The most important column is gender_name_ratio, which describes what proportion of given_name is associated with one gender versus the other.

Note

Only names with a ratio greater than 20 are included. This ensures that any prediction has a ~95% chance of being accurate based on the given name.

Only names with at least 1000 male or female examples were included. This filters out very uncommon names.

To add the gender_name_ratios.csv data asset

Step 1.

Add a courier for an Amazon S3 data source using the credentials for Amperity data assets. This courier should be run manually.

Note

If Amperity data assets credentials are not available on your tenant, make a request to Amperity Support to enable Amperity data assets for your tenant.

Object

The object should define the name of the file as “gender_name_ratios.csv” and the file tag as “gnr”:

[
  {
    "object/type": "file",
    "object/file-pattern": "'gender_name_ratios.csv'",
    "object/land-as": {
      "file/header-rows": 1,
      "file/tag": "gnr",
      "file/content-type": "text/csv"
    }
  }
]

Load Operations

The feed ID should be configured to be an empty load operation, using “df-xxxxxx” as a placeholder and the file tag should be the same as the object (“gnr”):

{
  "df-xxxxxx": [
    {
      "type": "load",
      "file": "gnr"
    }
  ]
}
Step 2.

Run the courier. Set the date to the previous day (i.e. “yesterday”).

Step 3.

Add a feed using the gender_name_ratios.csv file that was pulled to your tenant.

Use given_name as the primary key.

Important

Do not make this table available to Stitch or apply any semantic tags.

Activate the feed.

Step 4.

Edit the courier and update the empty feed ID to match the feed ID that was generated for the feed. For example, if the feed ID was “aBcdEf” update the load operation to:

{
  "df-aBcdEf": [
    {
      "type": "load",
      "file": "gnr"
    }
  ]
}
Step 5.

Re-run the courier, and then review the domain table that was built for this data asset.

After reviewing the data in the domain table, run Stitch.

Step 6.

You can add predicted gender to your customer 360 database in two ways, depending on how your brand wants to use predicted gender to build segments:

  1. Extend the Customer 360 and/or Merged Customers tables to include predicted gender (recommended).

  2. Add predicted gender values to your customer 360 database as a standalone table.

Extend the Merged_Customers table (recommended)

Note

The steps are the same for both the Customer 360 and Merged Customers tables.

Edit the Merged Customers table and extend the table for predicted gender.

Use a common table expression (CTE) to pull data from the domain table that contains predicted gender data (“Predictions_Gender”):

predict_gender AS (
  SELECT
    mc.amperity_id
    ,CASE
      WHEN UPPER(ratios.predicted_gender) = 'M' THEN 'Male'
      WHEN UPPER(ratios.predicted_gender) = 'F' THEN 'Female'
      ELSE ratios.predicted_gender
    END AS predicted_gender
  FROM Merged_Customers AS mc
  LEFT JOIN Predictions_Gender AS ratios
  ON UPPER(
    COALESCE(mc.given_name, SPLIT(mc.full_name,' ')[0])
  ) = ratios.given_name
),

Update the list of columns in the Merged Customers table to include predicted gender and combined gender:

,pg.predicted_gender
,COALESCE(mc.gender,pg.predicted_gender) AS combined_gender

Note

The combined gender column uses the value from the gender column in the Merged Customers table when a value exists, and then uses the value from the predicted_gender column if the gender column in the Merged Customers table is empty.

Use a LEFT JOIN to join the values from the common table expression to the Merged Customers table:

LEFT JOIN predict_gender pg ON pg.amperity_id = mc.amperity_id

Add a table for predicted gender (optional)

Your brand’s use cases for predicted gender may prefer using a standalone table.

  1. Add a passthrough table to your customer 360 database named Gender Name Ratios.

  2. Add a SQL table to your customer 360 database named Predicted Gender.

  3. Choose SQL as the build mode, and then use SQL similar to:

    WITH ratios AS (
      SELECT *
      FROM Gender_Name_Ratios
      WHERE gender_name_ratio >= 100
    )
    
    SELECT
      mc.amperity_id
      ,ratios.predicted_gender
    FROM Merged_Customers AS mc
    LEFT JOIN Gender_Name_Ratios AS ratios
    ON UPPER(
      COALESCE(mc.given_name, SPLIT(mc.full_name,' ')[0])
    ) = ratios.given_name
    

    where “100” represents a 99% accuracy threshold. Increase or decrease this value as necessary.

    Tip

    This table will be unique by Amperity ID and may be made available to the Segment Editor for use with campaigns.

Step 7.

Run the customer 360 database to rebuild the table (or tables) that contain predicted gender.

Adjust accuracy threshold

The default accuracy threshold for gender prediction is ~95%. This means that for any given name it has a 20:1 likelihood of being associated with a specific gender. If greater accuracy is required for a use case, add a custom gender_name_ratio threshold to the query:

WITH ratios AS (
  SELECT *
  FROM Gender_Name_Ratios
  WHERE gender_name_ratio >= 100
)

SELECT
  mc.amperity_id
  ,ratios.predicted_gender
FROM Merged_Customers AS mc
LEFT JOIN Gender_Name_Ratios AS ratios
ON UPPER(
  COALESCE(mc.given_name, SPLIT(mc.full_name,' ')[0])
) = ratios.given_name

where “100” represents a 99% accuracy threshold for gender prediction.