Another little PHP script, this time to find dead links. It takes a URL and starts crawling for links, testing if the places they link to are active are not. If the linked URL is on the same domain, it crawls into that page too, recursively (to a limit). Pages that have already been tested in the query are not tested a second time. Only one instance of this script can run at a time, so you may get a weird error message.

Syntax: "http://0.clrhome.tk/b/links/?url=<INITIAL>[&limit=<DEEP>][&set=<DOMAIN>]"

INITIAL: The page to start on (and host)
DEEP: Max number of recursions
DOMAIN: "both" (default), "domestic," or "international," to display all links, internal ones only, or external ones only

I made this after tifreak mentioned (on IRC) that he needed a crawler to find dead links, but I got it done too late. (I'm still going to use it for myself though.) It's a bit slow, but mainly that's because there's a half-second delay between requests.
Nice! I've considered writing something like this myself but never got around to it.

Does the script make HEAD requests for pages that it will not be crawling into? That could save quite a bit of bandwidth if you are crawling only one domain and have many external links.
That's a good idea. I'll have to rewrite some of it though.

EDIT: Done. Also fixed a few bugs with relative links and optimized it a bit.

By the way, if you want the source: http://clrhome.tk/b/links/src.php
  
Register to Join the Conversation
Have your own thoughts to add to this or any other topic? Want to ask a question, offer a suggestion, share your own programs and projects, upload a file to the file archives, get help with calculator and computer programming, or simply chat with like-minded coders and tech and calculator enthusiasts via the site-wide AJAX SAX widget? Registration for a free Cemetech account only takes a minute.

» Go to Registration page
Page 1 of 1
» All times are UTC - 5 Hours
 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

 

Advertisement