Deterministic Data vs Probabilistic Data: Key Concepts Explained

Pathlabs Marketing Pathlabs Marketing
Calendar icon July 9th, 2024
 
 

Many blogs discuss deterministic and probabilistic data, but at the end of the day, what exactly should we know about these types of data? More importantly, how do these data types play a role when executing digital media campaigns? 

This blog aims to answer these questions and tell precisely what you, as the marketer, should know about deterministic and probabilistic data.

What Is Deterministic Data?

Deterministic data is any data point that a user provides to a publisher or web location that serves as a direct identifier of the user’s identity.

Deterministic Data Examples

Deterministic data can include a user’s: 

  • Email address

  • Phone number

  • First name 

  • Last name

  • Physical address 

  • Date of Birth

How Is Deterministic Data Collected?

Many large publishers hosting websites, apps, or other platforms require or highly encourage users to provide deterministic data when they initiate a session. 

For instance, to use a Meta product like Facebook or Instagram, users must create an account and log in with an email or phone number for every session. 

In other words, Meta requires users to provide a piece of personally identifiable data, which it saves in its internal database to serve as a deterministic identifier. When users return to the publisher’s web location, as long as they provide this identifier, the publisher can automatically authenticate and recognize the user, even if they are using a different device.

When Is Deterministic Data Used in Advertising?

User Profile Creation and Ad Targeting

Deterministic data is more familiar in advertising when it comes to executing digital media campaigns, primarily within "walled garden" environments like Google, Amazon, Meta, OTT/CTV providers, and prominent publisher locations.

Users create profiles on these platforms and locations, providing a deterministic data point as a one-to-one identifier. As users engage further, the platform records these behaviors, actions, and interests, associating them with their profiles and deterministic identifiers.

For example, Pinterest allows users to create an account with an email. When a user engages with content, such as gardening, this data is associated with their profile, even placing their identifier in specific data audiences.

This is advantageous for advertisers because, when executing digital media campaigns on platforms like Pinterest, they can set targeting parameters for specific users, such as gardening enthusiasts. When the campaign launches, Pinterest can efficiently find users to target, as it has consistently tracked and associated all this data with the deterministic profiles of users on the platform. 

First-Party Data Activation

Deterministic data is also a backbone for first-party data activation. Media teams can upload lists of first-party data collected from their CRM or other systems into many walled garden platforms. 

The platform then matches deterministic identifiers from the first-party list (e.g., emails) with the deterministic profiles it has on file. Once activated, media teams can specifically target actual users in this first-party audience with their campaigns.

What Is Probabilistic Data?

Data that we can consider “probabilistic” tends to be fragmented data points that do not serve as direct identifiers of a user’s identity. 

Probabilistic Data Examples

This data can include: 

  • IP Address

  • Wi-Fi network 

  • Operating System 

  • Location 

  • Time of Day 

  • Zipcode

How Is Probabilistic Data Being Used in Advertising?

The concept of probabilistic data in advertising bumps shoulders with other topics like identity resolution, audience data targeting, and even cross-device methodologies and attribution. In short, it falls into the heaping mess of confusing topics in AdTech. 

Fortunately, as marketers, advertisers, and media experts, when it comes to probabilistic data, we mainly need to understand the following three key points:

  1. Publishers, AdTech vendors, data brokers, and identity resolution providers often collect probabilistic data. These entities, like LiveRamp, Tapad, and Lotame, gather this data from users across different sites. Even when users don’t provide a deterministic identifier, their goal is to record the other disparate user data points, such as IP addresses and Wi-Fi networks. 

  2. Nowadays, these entities are trying to find ways to better leverage this fragmented data they collect to help them pinpoint user attributes and deliver relevant ads to those users. The idea is that when users don’t provide a deterministic identifier, probabilistic data can be leveraged to create anonymous user identities and audience lists for future ad serving, heavily relying on machine learning and AI algorithms, statistical analysis, user profile matching, and so on. 

  3. Sounds confusing and complex, right? It is. In reality, we know these entities are collecting data that we can consider “probabilistic” to make guesses, create user identities across datasets, and use them for ad targeting. They are even trying to use it to link users' identities across devices. However, the exact methodologies for this type of data crunching are often proprietary, and the resulting audience lists are often “block box” in nature.

At the end of the day, when launching digital media campaigns, it can be difficult to know exactly if deterministic or probabilistic methods are used to identify and serve ads to users.
In our position of executing digital media campaigns, whether deterministic or probabilistic data is used is often not the most important factor for ad-serving decisions. Our focus should be more on planning suitable campaigns for our objectives, selecting the proper channels, and choosing the right audiences to target. As with any digital advertising campaign, proper testing will tell you if an audience is valuable, more so than blindly trusting a particular data source.
— Kyle Kienitz, Director of Education, Pathlabs

Leveraging Deterministic Data in Your Marketing Campaigns

Those wanting to stay up-to-date and take advantage of the promise of deterministic data and authenticated users can leverage the following approaches: 

Explore Advertising on Platforms Like Walled Gardens

Utilize platforms that operate within walled gardens and offer deterministic data models. As mentioned, these environments, such as Facebook, Google, and Amazon, collect deterministic identifiers to provide more accurate and reliable campaign targeting.

Leverage First-Party Data

Utilize the first-party data collected from CRM and data systems. Then, upload these lists into walled gardens or ad platforms offering deterministic ID solutions. This increases the likelihood of reaching these exact users on the first-party audience list.

Understand New Deterministic Data Innovations

Advertisers should be aware that many entities, like The Trade Desk, are finding ways to better identify users deterministically without the features of being a walled garden platform. The Trade Desk is working on UID2.0, which allows publishers on the open web to have users opt into the UID system and provide a deterministic data point to serve as their identifier. The user provides this identifier; then, as they go to different locations owned by other publishers affiliated with the UID 2.0 system, they can be tracked and targeted with more certainty of their identity.

Navigate Deterministic and Probabilistic Data Methodologies with a Media Execution Partner (MEP)

Deterministic and probabilistic data are buzzwords in the digital media industry. Questions about what this data is and how it impacts campaigns continue to arise, especially for those executing digital media for brand clients, like independent agencies

For independent agencies still familiarizing themselves with these data types and models and needing guidance on how they fit into their media strategies, partnering with a Media Execution Partner (MEP) like Pathlabs is beneficial. We provide the expertise, workflows, and technology necessary for effective digital media execution, focusing on staying updated with advancements in digital media, such as deterministic and probabilistic data.

As the data-targeting and identity landscape evolves, Pathlabs remains vigilant and prepared as your MEP, ready to adapt and implement the latest digital media strategies.

Interested in learning more about how an MEP can help your agency drive business outcomes? Contact us today.

Previous
Previous

Navigating Uncertainty in Online Advertising Tracking

Next
Next

June 2024 Recap: Unveiling AdExchanger's Exclusive Report, Wrapping Up MediaPost Series, and more.