3. Non-Algorithmic Measures

Posted by Ben Sizer on Sun 10 March 2019

There are a range of measures that can be used to decrease the amount of infringing content on a platform without requiring algorithmic matching systems. I’ll go through several of them and list some practical suggestions for platforms in each case.

Changing user incentives

Sites like YouTube often already have a concept of an account being ‘in good standing’ but this alone doesn’t stop people from creating throwaway accounts. An alternative would be to start the account with limited privileges and functionality and ramp them up over time, as the user demonstrates they are not abusing the system. This will make the loss of an account more of an inconvenience and thereby deter infringing behaviour - and on certain sites it can act as a positive incentive, such as on the various StackExchange sites such as Stack Overflow, where contributing to the community positively unlocks new features.

So, the first things that a site can do to reduce infringement is:

  • Make account termination more punitive, to deter malicious acts on the platform.
  • Consider giving users extra privileges over time which will make them more reluctant to lose their account and more interested in following the rules.
  • Make account termination more likely. There’s no reason why someone should get three chances with copyright infringement if you made it clear to them from the start what their obligations are - providing your process is accurate enough of course, which we’ll come to later. Set an example to other users.

Educating users

Here’s what YouTube shows uploaders to deter them from infringement - a polite request, somewhat out of the way of the main upload form and probably rarely seen, never mind read. The actual warning is highlighted in yellow so you can find it:

ALT Screenshot of YouTube's upload form

Google’s own search engine can show you literally millions of videos on YouTube where someone has uploaded a video they had no rights to, and typed “no copyright infringement intended” or some paraphrasing of it, because nobody explained that what they were doing was simply illegal from the start. There’s a tiny bit of text in the mostly-ignored “Help and Suggestions” pane, which you’d have to click through to learn more about anyway.

Let’s compare that to what YouTube shows rightsholders to deter them from enforcing their rights:

ALT Screenshot of YouTube's copyright takedown form

Note the clear warning that your privacy is not protected (unlike the original uploader of your work, who is anonymous throughout), and that you are threatened not just with account suspension but also with potential legal consequences if you misuse the process - a warning not given to the original uploader.

It is hard to read this as anything other than a deliberate attempt on YouTube’s part to streamline the upload process and slam the brakes on the takedown process, and as long as they are in the business of selling ads they are sadly motivated to keep content up as long as possible rather than to encourage users to upload responsibly.

Compare this with the approach taken by Wikipedia:

ALT Screenshot of Wikipedia's content editing form

Literally the first thing you read, when you edit a page, is a stern warning that copyright matters. It’s not hard to implement, it reminds users of their responsibility, and it demonstrates to the world that you are reminding users of their responsibility.

So, a list of things sites can do to educate their users include:

  • Giving plain text explanations on a user’s responsibilities regarding copyright. Explain that there are virtually no circumstances under which uploading someone else’s music, or video, or art, or writing, is legal without getting permission first.
  • Give clarification on areas that users may consider ‘gray areas’ but are actually almost always infringement (e.g. game mods that distribute the original game content, ‘abandonware’ games) and areas that depend on a site’s prior licencing agreements (e.g. song remixes, covers, etc)
  • Remind users at the point of submission or upload that they shouldn’t violate copyrights.
  • Consider making them click through an acknowledgement before they publish. In fact it could be a mini-questionnaire to ensure the user has thought through whether they own the content or not. On video sites they can even fill it in while the video is uploading and processing.
  • Warning users that uploading unauthorised content for public consumption can lead to account termination.

Community moderation

If you see a video on YouTube that is breaking some law, you can opt to report it - unless the law it’s breaking is copyright, in which case they only allow the owner to report it. This is an attempt to deter infringement reports, by ensuring that the only people who can submit the report are those who have to accept their names being put on the site in place of the disabled content (and face the wrath of disgruntled users).

Again, we can compare with Wikipedia, where anyone can edit a page that they think is wrong, and invite discussion about it on the accompanying Talk page if it’s not clear until a decision is reached. Or consider programming site StackOverflow, where anyone can flag a post for moderator attention, and high-ranking users can also double-check those flags and confirm the accurate ones and clear any which are incorrect. Both of these massive sites are entirely comprised of user-submitted content, but the level of infringing content is relatively tiny.

Consider a much smaller, niche site, such as a forum for players of a single online game, or a group of creative writers posting their work and sharing feedback. In these cases the users are even more of a tight-knit community, and won’t want to see their community diluted or even jeopardised by people using it to distribute unauthorised material. So they will have an incentive to ask those people to stop, and to alert moderators if they don’t. Almost every type of forum software comes with a built-in post reporting system, and almost every small site will have some moderators on hand to respond to the reports, even if just a small group of volunteers.

This goes for much more than unauthorised copyrighted works - users can help keep out spam, abuse, harassment, etc. This doesn’t absolve a platform holder from its responsibilities, but if site owners and site users can work together the problem is significantly lessened.

We hear so much about the “400 hours of video uploaded per minute”, but much less about the billions of views per day. If just one in a million of those viewers saw an infringing video and flagged it for review, that’s thousands of daily data points that the site owners and moderators can use to find which videos to check and potentially remove. If just one of hundreds or thousands of people who watch an infringing music video thought, “I know this shouldn’t be here, and as a fan of this band, I want to flag it so that they’re not being ripped off”, the number of unauthorised views would be slashed.

As we saw in the previous entry with YouTube deliberately removing the copyright flagging tool, we see not only the way in which these provisions have created perverse incentives for platforms to make infringement more prevalent, but the way that an effective tool for preventing that infringement can exist when those incentives are removed.

It is true that there will be a lot of content flagged as copyright infringement - just as there is a lot of content flagged as hate speech, abusive content, scams, spam, etc. Platforms already need moderators to handle all this. Not every flag will be 100% correct, but again the platforms are used to this. Adding copyright to the mix would - and should - just be an extension of this duty.

Most people want to do the right thing, and most people want to both support the creators they follow and preserve the community they’re a part of. Sites that give them the tools and capabilities to do both things at once see much lower rates of infringing content as a result.

Practical courses of action:

  • Encourage users to feel a sense of community on the site, rather than just to be consumers of ‘content’.
  • Provide tools for users to give feedback to each other on what material is expected on the site.
  • Provide tools for users to alert the site owners to material that should not be there.
  • Consider providing tools for trusted users to hide or delete material entirely.

Content categorisation

Imagine I ran a website like Artstation, specialising in users sharing their own digital art. I don’t practically have to worry about users uploading music videos, because even if it is technically possible, nobody is going there to view them. Users are going to report such uploads because they are not what they are visiting to see. And since there is only really one type of work on my site I can understand the legal issues quite well, provide detailed instructions about copyright to inform users of their rights and obligations, and can act swiftly on valid infringement reports. This vastly reduces the moderation burden because the ratio of infringing work to authorised work is lower and the capability to deal with it is increased.

A similar process can even work for a general purpose site. If a video upload site has separate Music and Gaming sections, the moderators working on Music don’t have to understand the ins and outs of what is acceptable streaming material or whether Sega allows Sonic The Hedgehog fan-fiction. Similarly the Gaming moderators don’t have to concern themselves with whether some obscure death metal band’s video was actually uploaded by the band or not, or whether a live performance video was a rip from a DVD or a fan’s own experience. Just punt the content over to the other side where the specialist moderators can act on it more effectively, with a smaller set of guidelines and rules to consider.

In fact, we already see a variation of this on many sites mixed with community moderation, even on the big platforms today. If you run a Facebook group, you can delete the posts that don’t fit your group’s topic, choose the members of your group, and so on. The tighter focus of the group makes it easier for the admins to judge what is and is not relevant content - and this concept can apply to the whole platform, not just to the user-run sections of it. This is a concept we’ve seen on many sites, from forums to Yahoo Groups to Usenet - users can have the freedom to join and create groups to talk and share ideas, but users also benefit from being given more relevant content in the groups they opt in to.

Another idea might be to have users provide more metadata while they upload. To take a video example, instead of just entering a vague category, they could be asked simple questions - Is this a music video? Is it your music? Is it a game stream? Which game is it? This does two useful things - it helps the site funnel the upload towards moderators who truly understand the content, and it can help categorise the content so that users can find it more easily. Everyone wins, if the content is authorised.

In summary:

  • Consider focusing on a narrower subject area
  • Compartmentalise a general-purpose site so that moderation can be more effective in each area
  • Encourage knowledgeable users to flag wrongly-categorised content
  • Allow users to provide useful metadata at the point of upload to aid categorisation
  • Employ moderators who know a bit about specific subject areas to make better and quicker judgements

Spot checks

There are lots of real-life situations where it’s impractical to check everybody, but we still want some degree of observation to ensure that there is at least a chance of finding and stopping the behaviour we want to prevent. We can’t track the speed of every car, but we can place a few speed cameras and traffic police around to deter people from going over the limit. Tax inspectors might check occasional accounts to see if tax evasion could be taking place. And so on. The chance of getting detected means that people are less likely to break the rules, even if they haven’t themselves been checked (yet).

Platforms can use this approach to deter infringement by selecting a random sample of uploads or accounts to check. This places a much lower burden on them than if they checked every single piece of content, but is likely to have a deterrent effect well beyond the actual detection rate.

We can get even better bang for the buck if we take our previous results into account. We don’t need to check the same content twice. We can also assume that a user with 2 or 3 legitimate videos up already is a safer bet than someone uploading something for the first time - even more so if they’re monetising those videos and unlikely to want to be taking any risks. Anyone who’s ever moderated a forum knows that spamming is much more likely to come from a fresh account than from someone who’s been participating in the community for 4 years. By being sensible about it, the limited time available to our moderators is spent on content that is more likely to be a legitimate problem.

How practical is this, really? Turns out that it’s actually more practical than you might think. YouTube’s ‘400 hours uploaded per minute’ equates to a bit over 3.25 million videos per day. Sounds massive - but imagine we just wanted to spot check 1 in 10 of them - that’s about 13,600 videos per hour. If we hired 3000 moderators (and bear in mind that Facebook, for example, already employs far more than that) then we’d have roughly 1000 people working in any given hour, so each would have to check 13 videos in that hour, giving 4 minutes for each one - suddenly this is looking very manageable!

(The USA median wage in 2017 was about $31,500. That’s $93M in salary if you employ 3000 of them. Sounds like a lot, but it’s only 1% of the estimated $9BN YouTube was reportedly earning back in 2015. It’s a drop in the ocean and well within their financial capabilities.)

It gets better - if you already have licences in place for a lot of the work - as any responsible site allowing anonymous users to upload years of video per day would - you don’t even need to check that work, as your liability is covered.

To summarize:

  • Pay people to check for copyrighted content
  • Pick a small subset of content at random to check manually
  • Focus on new users to ensure the checking is happening where it’s most likely to make a difference.
  • Make users aware that content may be checked

Direct deterrents

Finally we come to the direct deterrent of making the user liable, albeit indirectly. If a platform was to make more attempts to verify each user’s identity to a reasonable degree then it might be possible for such a platform to recoup any damages caused by that user from that same user. And if users feel they may be held directly and financially responsible for the infringement they cause, they are less likely to do it.

It’s not clear how well this would work in the context of the General Data Protection Regulation, but holding these details purely for this purpose sounds like a ‘legitimate interest of the data controller’ and it does not seem to be a reasonable requirement that a citizen is necessarily able to use a commercial internet service completely anonymously.

In brief:

  • Consider authenticating users so that there is a degree of accountability on their part - platforms whose users abuse the service may be able to get recompense from those abusers


All in all, it seems there are quite a few ways to deter and reduce copyright infringement on a platform without even needing to employ a single content recognition algorithm. These include:

  • Changing user incentives to make them less likely to upload infringing material
  • Educating users on what is acceptable content to upload to the platform
  • Letting the community use tools to report content that shouldn’t be there
  • Employ staff to run spot checks on the uploads to find and deter infringement
  • Consider whether users can be partly accountable

Anybody worried that so-called ‘upload filters’ are the only way to make ‘best efforts’ to prevent copyright infringement should now see that there are several other effective tools available.

But if somehow all this isn’t enough, there are still technological measures to consider which can lighten the moderator burden. I’ll cover these in the next post.