Getting version information: -vReferenceChecking -fragmentsIgnoring certain links: -ignore

Ignoring certain links: -ignore

If you use Big Brother regularly, you might notice that some links are consistently reported to be incorrect, even though they work when you try them out with your browser.

There are several possible reasons for this problem. The most common one is that Big Brother does not behave exactly like a browser, and because of this, it sometimes runs into server bugs. Here's why. A browser connects to the remote server and says, "send me the document". The server says "all right" and complies. Big Brother works differently. It doesn't ask for the document; instead, it asks "does the document exist?". Some (buggy) servers are not used to this kind of request, even though they should be, since it is part of the HTTP standard. They answer something which could be translated as "uh?". This is why Big Brother reports the link as invalid, even though it looks valid when using a browser. (In technical terms, browsers do GET requests, while Big Brother does HEAD requests, and the latter are often badly supported by server software.)

So, if you run into this problem, there is no way to fix it, except have the webmaster switch to some better server software. If that is beyond your control, you can use the -ignore option to have Big Brother ignore the link. This option expects an argument, a regular expression which describes a set of addresses to be ignored. For instance, I use

-ignore "^http://www\.imdb\.com/M/"

because the Internet Movie DataBase's CGI scripts don't handle HEAD requests properly.

For convenience, you can use -ignore several times in a single command.


François Pottier, May 5, 2004

Getting version information: -vReferenceChecking -fragmentsIgnoring certain links: -ignore