Many studies and independent contributions show that a huge amount of paedophile and harmful contents are distributed using p2p file exchange systems, and that the volume of such exchanges is increasing, see for instance [1], [2], [3], [4], [5], [6]. A report from the United States General Accounting Office in 2003 [1] concludes that "child pornography is easily accessed and downloaded from peer-to-peer networks".

A French working group (composed of administrations' representatives, user associations and relevant economic actors) published in 2005 a study about paedophile contents available on the internet [2]. It reveals that French law-enforcement authorities typically observe 10 to 20 persons engaged in significant paedophile p2p exchanges per day in France. According to another report on child protection [3], written at the request of the french minister for family, the number of files with paedophile content available via p2p systems would be between 200 000 and one million.

This can easily be checked by any user, since a simple query on the keywords porn or pedo with a classical p2p client leads to hundreds, and up to several thousands, of answers.

The presence of such content, and its very easy access, make the current situation particularly worrying for p2p users, in particular children. Indeed, a significant number of children, in particular teenagers, nowadays use p2p systems [7], [3], [8], [1], [9], [5]. According to the 2005 Eurobarometer Survey on Safer Internet [7], 50% of the children of the European Union have access to the internet.

A study conducted in 2003 in France [3] established that 31% of children having access to the internet were using p2p systems. The presence of harmful contents in these systems, in particular paedophile ones, therefore constitute a worrying danger for a significant proportion of European children [8], [1], [9], [5]. Parents are in part aware of this situation: 69% of european parents believe their child has been exposed to harmful or illegal content on the internet [7].

This is even more alarming if one considers the fact that many fakes, ie files with contents that differ significantly from their names, are present in these systems. Because of this, all users, including children, face a high risk of downloading and visualising unwanted content1 [8], [1], [9].

It is clear that viewing paedophile contents can be harmful for adults. Apart from the shock experienced by most users at the sight of such pictures, it is suspected that easy and/or unwanted access to paedophile content may increase or even create the user's interest for such contents2. Also, a non negligible percentage of viewers of paedophile contents are paedophiles having already had sexual intercourse with children. The wide presence of paedophile content in p2p systems make these people feel safe and unattainable in these systems, and leads to a trivialisation of such content3.

Despite the fact that this situation is nowadays widely acknowledged, there is still no available filtering technique or content rating system to protect p2p users, in particular children, from harmful and paedophile content. Similarly, only few tools exist to help law enforcement authorities and other child protection organisations in fighting p2p paedophile exchanges. Actually, and despite some progress has been done thanks to the studies cited above, there is still an important lack of precise knowledge on this topic. It has been observed at many occasions that this has a deep impact on our ability to fight these exchanges [2], [10], [3]. For instance, the report written in 2005 at the request of the french minister for family [3] established the urgent need for studies of this phenomena, in order to understand better what is going on, help parents protect their children from unwanted content, and design appropriate tools for protecting children on the internet. This report emphasised the need for a watch, coordinated at the European level, to monitor not only the evolution of children's uses of the internet, but also the evolution of the risks they incur.


The objective of this project is to tackle these issues by implementing key software, setting up reference databases and conducting leading studies, both to protect p2p users, in particular children, and help law enforcement authorities and other child protection organisations in their task. More precisely, we will focus on the following three areas, each with its own objectives.

Content rating and fake detection system

Our core objective is the design and implementation of a service able to give, for any file encountered in our measurements, a rating of its content as paedophile and/or pornographic, as well as an indication of the fact that it may be a fake or not. A confidence ratio will be associated to each of these indications. This service will be available on-demand to end-users through a web page form, but its use will be limited to avoid abuses (typically, we will limit the number of queries per user and per time unit in order to prevent users from searching paedophile content with it). A full unrestricted version will be provided to relevant institutions, with additional information like the date of first appearance of the content, the number of peers providing/downloading it during time, etc.

Such a tool would be a first step towards the possibility for ISP to filter p2p content, and for end-users to have indications on the content of a file they are interested in, before downloading it4. It may also be included in parental control systems and in p2p clients, which may send automatic queries to our system when needed. This would allow a significant reduction of exposure of p2p users, in particular children, to harmful content.

Paedophile keywords

One may identify three different kinds of paedophile keywords: the basic ones that anyone would think of to find paedophile content, more specific ones known mainly by people with experience in handling paedophile content (like paedophiles themselves and law enforcement personnel), and hidden, short-term keywords known only by small groups of people (who exchange these keywords in chat systems or other interpersonal communications). Identifying paedophile keywords therefore is a key issue for filtering, as well as law enforcement. It is also necessary to send appropriate queries to p2p systems for the measurement of paedophile activity. An objective of the project therefore is to use huge amount of recorded queries and file names to uncover such keywords, including hidden ones that serve only for short periods of time.

This will result in a dynamic list of paedophile keywords, that will evolve during time, which we plan to send to law enforcement authorities and a restricted set of other relevant institutions5. This list will contain detailed information on the keywords, like their frequency during time, the other keywords with which they appear, their date of first appearance, etc.

Improved knowledge of paedophile activity

Our objective here is to give an accurate and detailed view of what is going on concerning paedophile activity in currently running p2p systems. This includes the evaluation of the number of files/users involved, the identification of various kinds of files/users, and several other basic statistics, together with their evolution during time. We also seek more subtle information, like studies of how users develop an interest in paedophile content, global maps of paedophile contents, including their nested community structures, and methods to make the difference between people that probably download paedophile content accidentally and people that focus on such contents.

The objective here therefore is to obtain rigorous and deep enlightenment on p2p paedophile activity, which will lead to the publication of detailed reports on each aspect, as well as both technical and general public synthesis reports at the end of the project. We want to change the current situation into a situation in which we have a precise knowledge of paedophile activity in p2p systems.


