Ask HN: AI bots everywhere – does anyone have a good whitelist for robots.txt?
61 points by scoofy 23 days ago | 32 comments
My niche little site, http://golfcourse.wiki seems to be very popular with AI bots. They basically become most of my traffic. Most of them follow robots.txt, and that's nice and all, but they are costing me non-trivial amounts of money.
I don't want to block most search engines. I don't want to block legitimate institutions like archive.org. Is there a whitelist that I could crib instead of pretty much having to update my robots file every damn day?
ryukoposting 22 days ago | next |
I operate under the assumption that Google, OpenAI, Anthropic, Bytedance et al either totally ignore it or only follow it selectively. I haven't touched my Robots.txt in a while. Instead, I have nginx return empty and/or bogus responses when it sees those UA substrings.