First-party data sources¶
Third-party data, such as from cookies that track a product or brand on someone else’s website, are no longer supported by the Firefox and Safari web browsers, and support for third-party cookies will be removed from Chrome. This means that more than 75% of a user’s browsing activity cannot be tracked via third-party data. This percentage does not include user activity that occurs on mobile applications, which are typically tracked in ways that do not use cookies.
This is due to a combination of emerging trends around data transparency that give users more control over their data along with new government policies that mandate this transparency, such as General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA).
With the value of using third-party data rapidly degrading, it’s time to shift focus to first-party data.
First-party data, such as from cookies that directly track a product or brand on a customer’s own website and clickstream events that captures a user’s activity from within a mobile application, has become an essential part of any strategy that markets to customers based on their shopping preferences.
Capturing reliable first-party data is a very important part of understanding how your customers interact with your products and your brand. First-party data can be sent to Amperity as part of a broad strategy to build a complete data foundation that can then be associated with important profile attributes.
Raw clickstream data¶
Raw clickstream data should be provided to Amperity as a targeted subset of operational data points. This filtering can be done in two ways:
(Recommended) Configure the application that is sending clickstream data to send files that contain only a useful subset of fields. Configure these files as feeds and run the files that contain useful clickstream data on a daily basis.
Use a saved query to filter the fields into a useful subset prior to loading the data to Amperity, and then use two domain tables: one for all of the raw data and the other for the subset of useful data.
The domain table with all of the raw data should never be made available to Stitch and never added to the customer 360 database.
The domain table with the useful subset of data should be processed on a regular basis and may be made available to Stitch if it contains meaningful profile (PII) data, and then added to the customer 360 database as a passthrough table.
This dual-domain table approach ensures that Amperity has direct access to a filtered subset of operational clickstream data and that the superset of data does not require daily processing. Updates to that filtered subset can be done quickly by updating the saved query instead of creating a new feed.
Adobe Analytics¶
Adobe Analytics provides useful intelligence about customer activity on Web sites and mobile devices. Marketers can analyze clickstream data to understand what their customers are doing in real-time, and then optimize customer experiences across brands.
Clickstream data from Adobe Analytics contains standard fields , and then up to 250 conversion variables (evar1-evar250) <https://docs.adobe.com/help/en/analytics/admin/admin-tools/conversion-variables/conversion-var-admin.html> .
Conversion variables are customer-specific and represent events that identify:
The customer, such as IDs or PII
Customer interactions
Purchases, transactions, and prices
Marketing campaign IDs that tie the customer to marketing efforts
Behaviors that may be useful to better understand the customer
Adobe Analytics can send clickstream data to the SFTP site built into Amperity as a data feed.
Google Analytics¶
Google Analytics is an events- and session-based analytics service that collects data from websites and apps. Google Analytics 4 properties support privacy controls, such as cookieless measurement, and can be integrated directly on websites and apps to help your brand better understand the customer journey.
Clickstream data from Google Analytics contains a predefined series of fields, with sets of fields available for:
Identifiers, such as client ID, user ID, visit ID, along with dates and times
Totals, such as hits, page views, bounces, time on screen
Traffic sources, such as campaign IDs, customer IDs, Google AdWords
Devices, such as device type, operating system, screen resolution
Geographic data, such as state, city, latitude, longitude
Hits, including web and app, and then actions based on hits
User-defined custom dimensions
Use a Spark SQL query similar to:
SELECT
_c1 AS clientId,
_c2 AS visitorId,
_c3 AS userId,
_c4 AS visitNumber,
_c5 AS visitId,
_c6 AS visitStartTime,
_c7 AS date,
_c8 AS totals.hits,
_c9 AS totals.pageviews,
_c10 AS totals.timeOnScreen,
_c11 AS totals.timeOnSite,
_c12 AS totals.transactions,
_c13 AS trafficSource.adwordsClickInfo.campaignId,
_c14 AS trafficSource.adwordsClickInfo.customerId,
_c15 AS trafficSource.keyword,
_c16 AS device.browser,
_c17 AS device.operatingSystem,
_c18 AS geoNetwork.city,
_c19 AS hits.dataSource,
_c20 AS hits.eCommerceAction,
_c21 AS customDimensions,
_c22 AS customDimensions.index,
_c23 AS customDimensions.value,
FROM appdata