Welcome to the Machine: Measures you can take to stop AI scraping your content

I saw the viral 'Goodbye Meta AI' repost doing the rounds on socials.

It's a hoax.

But what is clear is the scepticism over scraping in the name of 'AI'. Specifically, GenAI.

This legal conflict is heating up on multiple fronts:

But there is a new target. You.

Many social platforms have now disclosed that they are (and have been) using public data to train their AI models.

And for the ones that don't, and prohibit it, third party crawlers are doing it anyway.

In fact, it goes further.

The richest data sets used to train Chatbots are videos with subtitles.

Why?

Because it reliably processes how people speak in the correct context and mannerisms.

Pauses, pace, rhythm and flow etc.

Leading experts in the industry have dubbed it:

“

”

And we're all content creators, right?

While the law gets its act together, here are some steps you can take to protect yourself:

Double down on your privacy
Add a digital signature or watermark or image cloak to your images
If you have a website, prohibit scraping in your 'terms'
Opt-out - many platforms now offer this (Meta, Squarespace, Adobe)
If you suspect your data has been scraped without permission send a Data Subject Access Request (DSAR) - data privacy is a human right but you might want to speak with a privacy lawyer first
If you can't beat them, join them. There are now agencies paying for content to train AI models

Scraping and crawling is not new. Google and other search engines rely on it.

But is the modern-day 'Yellow pages' a different proposition to AI?

Only time will tell.

By Jack Jones
Published October 2024