My cron-driven alter ego ran several scripts on the Trolltech web
server. This is about one I wrote in 1996 and ran every half-hour
to make sure that qt-related subjects were well covered by the search
engines. At the time, some search requests didn't give as good results as
they could have, and I thought the likely reason was that the search
engines hadn't crawled the right pages.
But the pages tended to be in the Trolltech referrer log, or if
not, then some other page very close to them were.
So I wrote a crontab script to watch for new referrers in our
apache logs, and whenever it saw one, it did the following.
First, get rid of spam (yes, even then there were spam pages). The test
was simple: At most x% of the text could be links, the page should mention
one of a set of keywords, the page could contain at most y links, and at
least one link had to point to Trolltech.
If the page passed that test, the script tried to clean up the URL a
bit (delete session cookies, delete index.html). Next it tried to locate a
higher-level index page linking to the candidate page and other related
pages (since submitting an index page gave the search engine more to work
Finally, the script would submit either the payload page or the index
page to Altavista, Hotbot, Lycos and a fourth engine whose name I've
forgotten. I don't think it was Google, Google came later.
It worked very well. Searches for Qt-related subjects gave better
results than before, and yes, the search engines saw more links to
troll.no. The script ran until shortly before I left Trolltech in
2001. By that time Google had learned to crawl well, and the script
laid unused and forgotten until I found it today, while going through
and wiping my old hard disks. (Update: The reason I don't know
the name of the fourth search engine is that I had put the submission
URLs in a configuration file.)
Update: The fourth was, of course,
whose existence I had quite forgotten.