Is there a simple way to severly impede webscraping and LLM data collection of my website?

Maroon@lemmy.world · edit-2 5 months ago

Is there a simple way to severly impede webscraping and LLM data collection of my website?

corroded@lemmy.world · 5 months ago

I probably should have specified I’m using libcurl, but I did try the equivalent of what you suggested. I even tried setting a list of user agents and having it cycle through. None of them work. A lot of anti-scraping methods use much more complex schemes than just validating the user agent. In some cases, even a headless browser will be blocked.