Sending data to America?

A common theme in the debate recently has been the idea that Article 13 somehow actually empowers the big tech firms. An open letter organised by German tech firm NextCloud claims that “Article 13 requires filtering of massive amounts of data, requiring technology only the Internet giants have the resources to build. European companies will be thus forced to hand over their data to them, jeopardizing the independence of the European tech industry as well as the privacy of our users.” This is wrong or misleading on several levels, which we’ll examine in turn.

Starting with the claim that content recognition technology is “technology only the Internet giants have the resources to build”. This is deliberately ‘handwavey’ but we can examine the 2 major components here - the complexity of the software and the scale of the data.

The software is not necessarily trivial, but much of it is available for free. For example, this image matching software is available completely without charge, and returns similarity scores (rather than binary “match/no match” values) which help sites avoid false positives. (And it was developed in the EU, too.)
The scale of the data is not a significant problem: content matching is performed against ‘signatures’ which are smaller representations of the real data. Better still, since the directive only requires platforms to check against works “for which the rightholders have provided the service providers with the relevant and necessary information”, there’s a strong incentive for publishers and record labels to not only provide this data but to potentially help host and access it.

Next, the idea that “European companies will be thus forced to hand over their data” is clearly absurd on several levels. European companies can only transfer data outside of the EU under rules that obey the General Data Protection Regulation. They are not at liberty to ignore the GDPR in order to comply with the new Copyright Directive. So this leaves two potential outcomes:

If they can use external services without GDPR breaches, then the job can be done without infringing European standards of privacy.
If they can’t use external services without GDPR breaches, then those methods can’t legally qualify as the “best efforts” required under the Copyright Directive and they aren’t expected to use them.

(It’s also not at all clear why there’s the assumption that any personal or identifying data needs to be handed over in order to identify what’s in the content. If the implication is that it is the content itself that is private, then that is missing the point - the Directive covers works made available to the public. If American companies really want to mine this data, they will be able to, regardless of whether it is sent to them or not.)

Finally, this whole argument rests on the negative effect of expecting European content sharing platforms to yield up their advantage by giving their data to American companies. But in addition to the points already made above, we can go further and see that the claim that “Article 13 requires filtering of massive amounts of data” is only true in certain cases. Most smaller platforms will have a range of measures available to reduce the amount of potentially-infringing content on their sites without needing to employ content recognition technology. Paid and community moderators, effective flagging systems, systems that discourage rather than encourage users to upload infringing content - these can all play a part and will be sufficient for many platforms. Platforms will also have the option of licensing large amounts of content and therefore only have to concern themselves with the smaller amount that remains.

There are indeed a few sharing platforms where the uploads they currently accept from end users are so plentiful and diverse that they can’t possibly moderate the content any other way other than with the help of some sort of content recognition technology. This list consists of: Facebook, Instagram, YouTube, Snapchat, Twitter, Pinterest, TikTok, VK, Tumblr, Reddit, maybe a couple of others? What they all have in common is that not a single one of them is an EU company.

So the premise we started with is an objection to some hypothetical situation, where maybe one day someone in the EU could develop an online content sharing platform that is both large enough to provide a service that requires automated content recognition, but simultaneously not large enough to be able to support that technology itself. This situation does not exist now and it seems far-fetched to assume it would be a big problem in the future.

Meanwhile, in the real world, the handful of large EU platforms, like DailyMotion, are already integrated with Google and Facebook, letting these ‘Internet Giants’ track you as soon as you hit their site. These companies already get our data - and in a world where copyright is weak and advertising revenue rules, platform owners have to submit to the people who control the adverts, i.e. Google and Facebook. Yet by enforcing copyright we can restore the financial link between fans and creators, cut out the need to rely upon advertising revenue, and thereby wean Europeans off the adtech industry and the very type of privacy violations under discussion.

If anything, this strengthens the EU tech industry, as they gain the ability to provide valuable paid services to creative workers, and don’t have to compete with those who would simply give away the creative work for free. Nor need they rely on the ad revenue that Google and Facebook have a near monopoly over.

In summary, the idea that Article 13 somehow compels the EU to hand over data supremacy to the USA is an exaggerated threat of a future that would be no worse than what we have now anyway. It is not a compelling argument against the Copyright Directive’s benefits for artists, musicians, authors, filmmakers, photographers, and other creative workers, as well as the opportunity for it to reduce advertising-related privacy infringement.