ExamplesChecking your bookmarksChecking your site

Checking your site

This section shows which command I use to check my own site. You should be able to adapt it easily to your case. First, here is the command:

bigbro                                              \
  -mapfrom "^http://pauillac\.inria\.fr/~fpottier/" \
  -mapto file://$HOME/public_html/                  \
  -rec "^file:"                                     \
  -local -remote                                    \
  -proxy www-rocq.inria.fr:8080                     \
  -noproxy "^http://.*\.inria\.fr/"                 \
  -timeout 600                                      \
  -gentle 5                                         \
  -oraw stdout                                      \
  -ohtml report.html                                \
  -failures                                         \
  -fragments                                        \
  -ignore "^http://www\.imdb\.com/M/"               \
(The \ characters at the end of each line are used to indicate that this is a single command, even though it is written on several lines for clarity.)

Here is an explanation of the options used above. First, I define a mapping, which tells Big Brother that any document belonging to my site can be read directly from the public_html subdirectory of my home directory. (Using $HOME in this way is Unix-specific, but you can specify a full path explicitly under Windows.) Then, I enable recursion within my site. Determining whether a file belongs to my site is easy, since if it does, then it resides on disk, so it has a file: URL. This explains why I used -rec "^file:". Next, I enable checking both remote and local links. I then let Big Brother know about my proxy (note that the proxy's name ends with a custom port number which comes after a colon). The proxy is unnecessary when accessing machines within the domain inria.fr, hence the use of the -noproxy option. Next, I set the timeout value to 10 minutes and, to avoid consuming too much server time, I specify that at least 5 seconds should elapse between two requests to the same server. (This is especially important when using a proxy, since nearly all requests are sent to the proxy.) Next, I request "raw" output onscreen and human-readable output to a file called report.html. Displaying failures only saves time and makes the report more readable. I want fragments to be checked. I use -ignore to avoid checking some URLs which I know will cause failures. Finally, I tell Big Brother where to start by specifying the main URL for my site.

That's it! It might seem overwhelming at first sight, but remember that once the command has been stored in a script, all that's needed is to run the script whenever you want your site verified.

François Pottier, May 5, 2004

ExamplesChecking your bookmarksChecking your site