• 7 Posts
  • 36 Comments
Joined 3 months ago
cake
Cake day: December 11th, 2025

help-circle













  • The bigger companies are looking at robots.txt to see if they can scan your stuff for AI scraping purposes. I get a couple from google, bing and others. Im not sure about facebook, but I do see the bigger ones usually abide by robots.txt and stop there. It doesn’t stop them from hammering your robots.txt though.

    If you get fail2ban and/or block the ip range from one actor, it usually goes away.

    The worst offenders is openai which does NOT hit robots.txt and just scrapes/DDOS my small site. Until I put in a couple of infinite loop/nefarious solutions on the server. Then you have fail2ban see what ip addresses try and go deep and block them.