Firstly, I decide that there's a lot of spambots browsing UEC. This is a state of affairs that annoys me. Secondly, I'm bored at work and need something amusing to code. I think a spambot trap qualifies. The simple idea is to set up a page, linked to in a way that most normal users won't see, that will provide the address-harvesting spambots with loads of bogus addresses. It's been done before, but I figure it'd be a kick to write my own. I love these things. Everyone should run one. The first thing I need is data to make the random addresses from. First I did a host -l on a bunch of hosts. Not many still support it. From those that did, I got about 4400 real live hostnames. Next, I used cut to yank random words from the some Linux-HOWTO's. I used this regexp to clean up the words I pulled out: #!/usr/bin/perl while () { while ($_ =~ s/[\[\\\]\{\}\;\+\-\:\x09\=\?\*\(\)\'\,\.\" \$\@\#\|\~\/\%\!\*\&\<\>]//) {} print $_; } I love PERL. Just think, every random blast of line noise you've ever had probably evaluates to a useful regular expression. Yeah, I know I didn't have to escape -every- character there, but some of them were taking on special functions (like matching all numbers) without giving so much as a warning that it's probably not Doing What I Mean. So, escape characters it is. \x09 has tab covered... Anyway... now I'm ready to yank out some words and clean them up. We'll arbitarily grab the fourth word from each line... LinuxFaculty:~$ cat /usr/doc/Linux-HOWTOs/* |cut -d " " -f 4 | ./stripper.pl | strings |sort -u >wordies LinuxFaculty:~$ wc wordies 13642 13642 127090 wordies Wowow! That's 13642 words. Anyway, our "wordies" file can now serve a double purpose: I'll make a quick script that appends ".com" to the end of each word and run the list through it, then onto the end of my hostlist, and I'll also keep the original wordies file to use for random users @ our random domains. Now that I'm equipped with some random crap, I go about making up a page -- an .shtml with an SSI, rather than just a .pl, since I've noticed spambots tend to avoid knowingly executing scripts -- just for the spambots. I've tried to avoid using the word "spam" on my main e-mail-generating page because I'm sure by now people have thought to make spambots blacklist sites like this one, when they can recognize them. In fact, linking this page from the main one may be enough to cause them to ignore it in future. But hopefully not. :D