• argv_minus_one@beehaw.org
    link
    fedilink
    arrow-up
    2
    ·
    1 year ago

    No. Search engines fetch pages using plain old HTTP GET requests, same as how browsers fetch pages. There is some difficulty in parsing the HTML and extracting meaningful content, but it’s too late: the HTML is already stored on Google/Microsoft servers, ready for extraction, and there’s nothing Reddit can do to stop them.

    Reddit can make future content harder to extract, but not without also making it invisible to search engines, which would cause Reddit to disappear from Google Search and Bing.

    That’s why I say trying to charge money for AI training data is a fool’s errand. These facts make it impossible. That doesn’t mean Spez won’t try, but it does mean he won’t succeed.