On Tue, 18 Mar 2025 12:00:07 -0500
D Finnigan <dog_cow@macgui.com> wrote:
On 3/18/25 10:17 AM, Ben Collver wrote:
Please stop externalizing your costs directly into my face
==========================================================
March 17, 2025 on Drew DeVault's blog
Over the past few months, instead of working on our priorities at
SourceHut, I have spent anywhere from 20-100% of my time in any
given week mitigating hyper-aggressive LLM crawlers at scale.
This is happening at my little web site, and if you have a web site,
it's happening to you too. Don't be a victim.
Actually, I've been wondering where they're storing all this data;
and how much duplicate data is stored from separate parties all
scraping the web simultaneously, but independently.
But what can be done to mitigate this issue? Crawlers and bots ruin the internet.
On 2025-03-18, Toaster <toaster@dne3.net> wrote:
But what can be done to mitigate this issue? Crawlers and bots ruin the
internet.
#mode=evil
How about a script that spews out an endless stream of junk from /usr/share/dict/words, parked on a random URL that's listed in
robots.txt as forbidden. Any bot choosing to chew on that gets what
it deserves, though you might need to bandwidth limit it.
cpanm -n Hailo
Hailo -t nonsense.txt -b output.brn
Sysop: | DaiTengu |
---|---|
Location: | Appleton, WI |
Users: | 1,064 |
Nodes: | 10 (0 / 10) |
Uptime: | 148:03:04 |
Calls: | 13,691 |
Calls today: | 1 |
Files: | 186,936 |
D/L today: |
33 files (6,120K bytes) |
Messages: | 2,410,932 |