Description
Date depot: 1 janvier 1900
Titre: Bringing transparency to personalized services through statistical inference
Directeur de thèse:
Davide BALZAROTTI (Eurecom)
Directeur de thèse:
Patrick LOISEAU (LIG)
Domaine scientifique: Sciences et technologies de l'information et de la communication
Thématique CNRS : Non defini
Resumé:
Personalized services are online services that use information about their users to offer to
each user a service that is more adapted to her. With the proliferation of personal data over
the Internet, personalized services have become omnipresent in our daily life, including for
instance all services offering recommendations. Although this data-based personalization
has increased the utility of services for users and for service providers, it has also raised
privacy concerns that became increasingly serious in recent years. One example of
personalized service for which this issue is particularly stringent is targeted advertising.
Advertisement is the main source of revenue for many free web services such as Facebook
and Google. The ad ecosystem is complex and can be composed of many actors; here we
abstract away this complexity and we refer to the whole chain of organizations that are
responsible for sending an ad (e.g., companies that want to advertise, data brokers,
advertising platforms) as the ad engine. The prominent advertisement model today is payper-
click, which has led to an increasing amount of targeted advertising to increase the
likelihood that a user clicks on an ad. Targeted advertising has increased advertisement
revenues significantly. However, targeted advertising has been also raising more and more
concerns from users who often feel that it constitutes an invasion of their private sphere. In
particular, users often wonder “what data do advertisers have about me?” or “why am I being
shown this ad?”. In a nutshell, users’ concerns are mainly kindled by the lack of transparency
of current targeted advertising systems.
The main objective of this thesis is to increase the transparency of targeted
advertising by providing users with tools and methods to understand why they are
targeted with a particular ad, to infer what information the ad engines possibly have
about them, and ultimately to control it. Concretely, we propose to build a browser plugin
that collects the ads shown to a user and provides her with analytics about these ads and
tools to control them. The browser plugin can either give information for a particular ad such
as “you are being shown this ad because the ad engine likely thinks that you are a student”
or give analytics on a longer term such as “given the ads you have been shown in the last 3
months, the ad engine likely thinks that you earn less than $50k per year”.
One of the main challenges to build such a tool is to infer the information that the ad engine
knows about a user from the ads received. To explain our approach we abstract the system
into three components: the information the ad engine collects about a user either online from
tracking, or offline from data brokers (inputs), the ad engine that processes the inputs to put
users in certain marketing categories (the black box), and the ads sent to the user (outputs).
In this thesis, we propose to observe only the outputs and to infer the categories the user
was put in by the ad engine, regardless of whether this was due to a particular input or not. In
order to do that, we will simply collect the ads users receive, then group together all the
users that received the same ad, and look at the most common demographics and interests
of users in the group. We detail in Section B.1.b. the methods that we propose to develop to
do this statistical inference task. The main novelty of our technique is that it relies only on the
output, i.e., the ads observed by users and not on any input data the users may have
Thesis
description
Athanasios Andreou (EURECOM), directeur: Patrick Loiseau (EURECOM)
__________________________________________________________________________________
explicitly given. This makes our approach much more realistic. Then, we propose ways to
control the information services have about a user by noise addition rather than by trying to
directly block leakage of information, which is also a much more realistic process.
2. History and related work
Previous works made a number of contributions either by discovering problems [2], or by
proposing methods to bring more transparency to the ad ecosystem [1, 3, 4, 2]. We focus on
the studies that are the closest to our proposal and refer the reader to [5] for an overview.
Two studies [1, 4] proposed techniques to detect whether an ad is contextual, re-targeted or
behavioral. While this is an important first step for transparency, the studies did not take the
next step to detect why the ads are being targeted. Towards this direction, two studies
proposed techniques to see how the activities of a user influence the ads she receives [3, 2].
At a high level, these approaches monitor the input of users (e.g., the emails users receive
and send, the videos users see on youtube, the sites users visit) and they propose methods
to estimate the likelihood that a given ad was shown due to a given input. Thus, these
studies look at the inputs and outpu
Doctorant.e: Andreou Athanasios