FLoC’ing to the Topics? An Analyst’s take on Google’s new attempt at a ‘cookieless world’
February 11, 2022 -
As you may have noticed in my last post, some of the predictions we were making about how ‘21 would roll out were wrong, and a big part of that was due to changes in timing. Specifically, Google’s timing changes on FLoC, a key component of their move towards a more privacy focused and cookieless ad landscape.
While FLoC was about to go live last year, in June it was delayed until the end of ‘23. This seemed to be in response to significant negative backlash from both privacy advocates and antitrust regulators in Europe. I wondered at the time if the delay wasn’t to buy time for a new approach, and it looks like that was correct.
Enter Topics. While it’s still going to be controversial, it seems to be a better approach towards helping setup advertisers and publishers to keep up with privacy. We’ve had several clients reach out to ask us to help them understand what these changes are, and it seemed like a great chance to share some content. The always amazing Colin Temple has written a great breakdown below on FLoC/Topics and their differences, I think it will be illuminating.
Also before that, here’s a few high level points to consider about all the changes that are happening (and will happen) around adtech and privacy over the next few years.
- The internet is paid for by advertising. This is to some extent self evident, but needs to be said. The only thing people hate more than advertising is a paywall – and you can’t have it both ways.
- Privacy needs to be a first class consideration. As the Internet was growing up, tech was built to ‘solve problems’ not ‘respect internet users’. We DO have a lot of sketchy practices around data and advertising, both on and offline. We most certainly need to be making changes around consent, collection and use of individual user data, and it’s a great thing that they are being taken into consideration as the technology that drives advertising evolves.
- Google legit wants to be part of the solution. As the primary seller of advertising on the open web, Google is under a lot more constraints than walled gardens when it comes to data collection and use. They also legitimately look like they care about privacy. That said, they still make a whole lot of money off advertising, and these iterative approaches to adtech changes show their desire to respect both people and their bottom line.
- The only surefire way to prepare your business is to take your first party data seriously. For all we know, Topics may evolve into something totally different in the coming months. How advertising is bought and used will continue to change, including inside the walled gardens like Facebook. Every minute you spend creating, connecting and controlling your own first party data will pay off in multiples as the adtech landscape moves towards a ‘new normal’.
What was FLoC?
FLoC (Federated Learning of Cohorts) was first described in detail back in January 2021 as an answer to how interest-based advertising, also known as targeted advertising, can survive changes to privacy on the web.
Between market forces responding to customer demand and government regulation, it is clear that privacy is a first-class concern online, and the nature of third-party cookies, especially as they existed a few years ago, led to some real issues in preserving one’s individual privacy while using the Web. As the scope of cookies continues to narrow, using third-party cookie data to group users and show them relevant advertising continues to be a challenge.
FLoC was one of the first parts of Google’s Privacy Sandbox initiative, which began to emerge in 2020. The goal of Privacy Sandbox is to move control of a user’s privacy data onto the user’s devices, so that what is actually communicated out from that in-browser sandbox is more generalized, anonymized data.
In this spirit, FLoC would run in the browser and analyze the user’s browsing to essentially group them into a larger cohort of users on browsers using FLoC who had similar browsing interests.
FLoC didn’t go well. Google tested it for a few months in 2021, but Chrome was the only browser to implement it (while other Chromium-based browsers chose not to), and it was met with a variety of criticisms. Among them was the fact that the definition of cohorts could ultimately tell the owners of websites quite a bit about their users without having to do any work tracking them online. Another was that it could be used as a vector in device fingerprinting, which is a technique that attempts to uniquely identify devices without storing a cookie or any other temporary data on them. Fingerprinting is especially problematic for privacy, since it removes virtually all privacy control from the user.
In response, Google shut down FLoC last August.
What is the Topics API?
The Topics API is similar to FLoC but attempts to address a number of concerns with it. While FLoC could create an unbounded array of cohorts that might ultimately tell you several seemingly unrelated details about a user, the Topics API scales that down significantly to specific topics of interest. Initially, that is being set to exactly 300 topics, though that number is likely to change as this evolves.
The Topics API works in the browser’s Privacy Sandbox, just as FLoC did. However, where FLoC produced a more complex grouping of users, the new Topics method will assign topics to users based on their browsing behavior according to one category: the topic of the sites they are looking at. Google has predefined topics associated with websites in its index, and will use in-browser machine learning to try and determine the topic of new websites based on their domain alone.
What does the browser track?
In general, the browser maintains a list of five topics that the user is most interested in based on their activity in the present week. To that list, a sixth topic is added at random. It then stores the six-topic list for the user for three weeks.
The presence of a random topic means a couple of important things. For one, every topic can be added at random, and given the scope any topic will have a considerable number of users in it. This makes it much harder to use a user’s membership in a topic to identify them if a topic has a low amount of real interest. This makes it harder to try and use a topics list as a factor in fingerprinting a user’s device.
This also means that any user can deny being interested in a topic. The fact that the Topics API listed “bananas” as one of the user’s topics does not actually prove that the user is interested in bananas. Statistically, they are more likely to be interested in bananas than other users, so the benefit of the topic presentation is felt in aggregate, not in individual sessions.
Besides adding some noise to the list of topics, another privacy feature here is that the definition of the week will vary from user to user. Different users will have different starting days for their weeks. Further, those definitions might change over time, and may occasionally have “weeks” longer or shorter than 7 days. The goal here is again to add some noise to the data, which makes it hard to specifically identify a user based on their browser’s responses to Topics API requests, but doesn’t diminish the aggregate value of the API.
What do sites and advertisers get from the Topics API?
When a site or ad uses the Topics API to inquire about what the user is interested in, there will be up to three topics disclosed to the website: one from each of the last three weeks’ list of interests.
So, from the list of topics associated with a user’s browsing in the current week, only one topic is disclosed to a site.
There are several features of the Topics API that are meant to help protect the user’s privacy:
- Aside from there being a random topic in the user’s list, there is an additional 5% chance that a completely new topic will be chosen at random.
- The topic provided to a website is bound to that site. Every time any script calls the Topics API on that website, they will get the same result as every other caller on the same site.
- The list of topics is reset when the user clears their browsing history.
- Topics are not provided by users browsing in Incognito Mode, or by users who have opted out of Topics in their browser settings.
- A party that calls the API can only learn about user associations with topics that sites they call the API from are themselves associated with.
That last one narrows the scope of what any site or advertiser can learn about. For instance, if my browser’s list of topics includes “bananas” and “vintage toasters”, then another website about bananas is eligible to learn about my interest in bananas, but will know nothing of my interest in toasters.
An advertiser who activates the Topics API from within their ads on an ad platform can only learn about my interest in “vintage toasters” if they are also calling the API from their ads on other sites about “vintage toasters”, or if their own site is about that topic.
This is not an exhaustive list of the privacy features that the Topics API is meant to have, nor is the specification of the API final. Google intends to evolve this as it is being tested and used.
The main improvements that the Topics API brings over FLoC are in its transparency. FLoC left the definitions of cohorts and who was in them ambiguous, and leaving the potential for that data to identify smaller groups of users, or even individual ones, depending on how it was implemented. Topics, on the other hand, are much more constrained.
The Topics API is not meant to replace third-party cookies, in part because nothing should. Third-party cookies were a problem for privacy, and so replicating everything they could do is not a solution to this problem. The Topics API is meant to add a factor that can contribute towards decisions on serving interest-based advertising, but it need not be the only one.
It also does not provide perfect privacy, and it is prudent that users understand this. The responses that a browser can provide are limited, but there is nothing stopping websites themselves from using identifiers of their own to record these responses associated with their users over time, in cases where they can persistently identify their users by other means. An ecommerce site, for instance, can maintain a record of topics that have shown up for users who were logged in, and over time compile a list of frequently-accessed topics and likely filter out the random ones. This won’t help them retarget the user on other sites, but it can have other use cases.
Regulations in their jurisdictions will apply, of course, and these sites will usually have to disclose this in their privacy policies. The important thing to remember, though, is that this is not meant to be a perfect solution, but an improvement over what existed before.