Ignoring certain links: -ignoreReferenceFocusing on -failuresChecking -fragments

Checking -fragments

Some URLs contain a # sign followed by a pointer to a specific spot in the document, like this:

http://www.someone.com/help.html#me

The final part of this URL, #me, is called a fragment. (Many people call it an anchor.) It is a reference to a specific place within the document http://www.someone.com/help.html. This fragment should be defined, to indicate which place is meant by this name. Fragments are usually defined using the <A> tag, like this:

<A NAME="me">some text</A>

They can also be defined by adding an ID attribute to any HTML tag.

Naturally, part of Big Brother's job is to make sure that fragments are correctly defined. However, this can use a lot of network bandwidth. Imagine what happens when Big Brother checks the above URL. First, it asks the server www.someone.com whether the document help.html exists. This is usually very fast, because the server simply answers "yes, it does" without actually transmitting it. However, if Big Brother wishes to make sure that #me is defined, then it has to download the whole document and look for the definition in it.

One should point out that Big Brother handles the problem in the smartest possible way. That is, it will never download a document twice, even in the most complex situations. Nevertheless, checking fragments could mean downloading a lot more data. (Have a look at the statistics displayed at the end of each report to see exactly how much more.) Because of this, checking fragments is an option. It is off by default; in that case, Big Brother will simply ignore anything that comes after the # sign. To turn it on, add

-fragments

to the command line.
François Pottier, May 5, 2004

Ignoring certain links: -ignoreReferenceFocusing on -failuresChecking -fragments