                            [WWWOFFLE Homepage]
           [Users Page] [Browser Page] [Win32 Page] [Changes Page]
 [Hints And Tips Page] [Download Page] [Win32 Download Page] [Feedback Page]
                                   [FAQ]

                   --------------------------------------

     [Using DontGet Entries] [Using CensorHeader] [Using Online Options]
[Using Purge Options] [Making WWWOFFLE Run Automatically] [Not Using Syslog]

                The WWWOFFLE Version 2.6 Hints and Tips Page

This page contains more information than that provided in the FAQ in a more
user friendly format to allow WWWOFFLE to be used in the most effective way.

Configuration File Help

DontGet Options

The DontGet section of the allows you to stop WWWOFFLE from fetching certain
types of URLs.

They can be specified as a filename extension that is not to be fetched, in
the example below no .zip files will ever be fetched.

DontGet
{
 *://*/*.zip
}

They can also be specified as paths on a server that are not to be fetched,
in the example below no files in the /dontget subdirectory on any server are
fetched.

DontGet
{
 *://*/dontget/
}

I have available a list of entries for the DontGet section that will stop
many unnecessary images / pages from being displayed. The easiest way to use
this is to copy this file to wwwoffle.DontGet.conf in the same directory as
the wwwoffle.conf file and include this file in the main configuration file.

DontGet
[
 wwwoffle.DontGet.conf
]

CensorHeader Options

The CensorHeader section allows you to control what information is sent
about you to the server or what is received from the server. Some of the
header information is important and cannot be censored, but some items can
safely be removed.

The principle reason for modifying the headers when browsing it to preserve
some privacy. In this case it is important to remove as much personal
information as possible from the requests that are sent. A second level of
privacy can be obtained by hiding information about the page that you came
from.

Some browsers send your username with the requests that they make. This is
the most important item to remove. This is sent in a From field.

Censorheader
{
 From =
}

There is also information about the browser that you are using. This is sent
in a User-Agent field. If you do censor this then it will stop most servers
from working out what browser you are using. There are other ways to detect
what browser is being used, for example what other headers are sent and in
what order. Also Javascript can be used to detect the browser and this
cannot be blocked using these options.
If you do censor this header then it is possible that you will be denied
access to certain sites or pages. This is not the fault of WWWOFFLE, but of
the site designer for being so selective. In WWWOFFLE you can supply your
own information for this (and any other) header, so my browsing shows up
with WWWOFFLE/2.6 as the browser.

CensorHeader
{
 User-Agent = WWWOFFLE/2.6
}

The other information that can easily and usefully be removed is Cookies.
These can provide information about how often you visit a site or what pages
you have viewed on it. If you don't want cookies being sent to servers then
remove the Cookie header, if you don't want to receive cookies from servers
then remove the Set-Cookie header.

The options in this section of the configuration file can be configured so
that they only apply to some URLs and not to others. This is most useful to
allow cookies to be sent to some servers and not others. For example if you
want to deny cookies to all servers except one called
www.trusted-with-cookies.com then you would need two Cookie options here,
one to allow cookies to the specified server and one to deny cookies to all
other servers.

CensorHeader
{
 <http://www.trusted-with-cookies.com/*> Cookie = yes
 Cookie = no

 <http://www.trusted-with-cookies.com/*> Set-Cookie = yes
 Set-Cookie = no
}

Online Options

The online options section of the configuration file allows for many options
to be set on a URL-by-URL basis. These options can be used to control the
way that WWWOFFLE decides which pages are to be fetched again when requested
and which ones are to use the cached version.

Pages that can usefully be cached for a long time are static pages, mainly
images. These might be the icons that appear all over pages on the same
server. These can be preserved in the WWWOFFLE cache for a long time and
only requested infrequently since they change rarely. The following example
shows the changes that could be made to reduce the bandwidth to one
particular set of static images (these URL specific options need to go
before the generic options in the section).

OnlineOptions
{
 <http://images*.slashdot.org> request-changed = 4w
 <http://*slashdot.org> request-changed-once = yes
}

Purge
{
 <http://images*.slashdot.org> age = 6w
 <http://*slashdot.org> age = 4w
}

I have a list of some entries for the OnlineOptions section that will help
reduce bandwidth, it is based on the example that is given above.

Another feature that some web-servers find useful is to force the browser to
keep reloading the same page. This can be done in a number of ways and there
are many ways in WWWOFFLE to ignore these requests. Using the
request-changed or request-changed-once options in the OnlineOptions section
will mean that WWWOFFLE will not make another request for a cached page
until it has reached a certain age.

OnlineOptions
{
 request-changed = 10m
 request-changed-once = yes
}

The request-expired and request-no-cache options can be set to no so that
even pages that the server says have expired are not requested again.

OnlineOptions
{
 request-expired = no
 request-no-cache = no
}

Purge Options

The Purge Options allow control over what files in the cache are to be
purged. The purging is done based on the timestamp of the file that stores
the page in the cache.

The first choice to make is whether to keep pages based on when you fetched
them or based on when you last viewed them. I choose to use access (viewing)
time rather than modification (fetching) time. This means that pages that I
revisit often don't get removed too soon. This selection is made using the
use-mtime option. To purge based on viewing time set use-mtime = no, to use
the time of fetching set use-mtime = yes. The one problem with this is if
you change the access time of all of the pages in the cache (e.g. by running
grep WWWOFFLE /var/spool/wwwoffle/http/*/*) then this will change the access
time and stop the pages from being purged.

Purge
{
 use-mtime = no
}

The next choice is whether to set a maximum size for the cache or to let it
grow. The maximum size parameter that you set is not automatic, it only
takes effect when you run wwwoffle -purge. The size that you specify is used
to calculate an age that should be used in the purge. If the default age for
purging is 28 days, but using 25 days would keep to the specified size then
that is used instead. This is a two stage process, once with the default
ages then once with the newly calculated age.

Purge
{
 max-size  = 0
}

Finally there is the option to make some of the sites in the cache last for
different amounts of time. This can be longer or shorter than the default or
can be set never to purge. The ages are all measured in days (unless a
longer suffix is used) and the value -1 is used to indicate a site that is
never purged. These can now be specified using pathnames to allow parts of a
server to be purged at different ages to other parts.

Purge
{
 # Don't purge this part of this site ever :-).
 <http://www.gedanken.demon.co.uk/wwwoffle/> age = -1

 # Default to 4 weeks days for http and only 1 week for ftp.
 <http://*/> age = 4w
 <ftp://*/>  age = 1w

 # You must have this if you want to purge by URL rather than just by host.
 use-url = yes
}

If you have a DontGet section in the configuration file that contains a lot
of entries and is updated often then it is useful to have the purge function
remove these pages. This can be done with the del-dontget option.

Purge
{
 del-dontget = yes

 # You must have this if you want to purge by URL rather than just by host.
 use-url = yes
}

Making WWWOFFLE Run Automatically When Required

WWWOFFLE is the type of program that when it is running perfectly then the
user should not know that it is there. It can be fully automatic so that
when the computer is booted it starts and when you go online and come back
offline WWWOFFLE changes mode.

This can all be achieved by using the example scripts that are supplied with
WWWOFFLE. Below is just a simple introduction to what is required, for more
detail and better scripts you should look at the contrib directory of the
WWWOFFLE source code.

Booting

If you have installed WWWOFFLE from a binary distribution, for example a
Linux distribution like Debian, RedHat, Suse or others then this will be
done automatically. If you have not then the information below may be of
help.

If you have BSD style startup scripts then the file /etc/rc.local or
someother file of similar name will contain commands to run at boot time.
This is the easiest case and all that is needed is to add the command to run
WWWOFFLE. The safest place to add this is to the end of the file.

/usr/local/sbin/wwwoffled -c /var/spool/wwwoffle/wwwoffle.conf

When using a SVR4 style of startup scripts there will be many scripts in
various directories in /etc. Typically there will be two copies of the same
script, one called /etc/rc2.d/S90wwwoffle and one called
/etc/rc0.d/K90wwwoffle.

case "$1" in
   start)
      /usr/local/sbin/wwwoffled -c /var/spool/wwwoffle/wwwoffle.conf
      ;;
   stop)
      /usr/local/bin/wwwoffle -kill -c /var/spool/wwwoffle/wwwoffle.conf
      ;;
esac

Going Online and Offline

If you are using PPP to make the network connection then there are scripts
that are run automatically by pppd when the connection is made and when it
is broken.

To automate the connect process you will need to edit /etc/ppp/ip-up and add
the following to the end of the file.

   /usr/local/bin/wwwoffle -online -c /var/spool/wwwoffle/wwwoffle.conf
   /usr/local/bin/wwwoffle -fetch -c /var/spool/wwwoffle/wwwoffle.conf &

To automate the disconnect process you will need to edit /etc/ppp/ip-down
and add the following to the end of the file.

   /usr/local/bin/wwwoffle -offline -c /var/spool/wwwoffle/wwwoffle.conf

One problem with this is that pppd will not wait for the WWWOFFLE fetch that
is started in /etc/ppp/ip-up before the network connection is broken. This
means that it is quite possible to interrupt the fetch process. WWWOFFLE
will try to handle this gracefully, but this is no substitute for monitoring
the fetch progress before breaking the connection.

Not Using Syslog

The default option for WWWOFFLE is that error messages get reported using
syslog. This is not always convenient if you don't want all of the WWWOFFLE
error messages mixed up with the other ones.

There is an alternative which is not to use WWWOFFLE's normal output
messages instead of syslog or as well as syslog. When you start WWWOFFLE you
can specify that it is not to disconnect from the terminal. This has the
effect of causing error messages to be printed to the terminal. These can
then be redirected to a separate log file.

If you find your WWWOFFLE startup script it will contain a line like the
following:

/usr/local/sbin/wwwoffled -c /var/spool/wwwoffle/wwwoffle.conf

All that you need to do is to change this line so that it looks like this:

/usr/local/sbin/wwwoffled -c /var/spool/wwwoffle/wwwoffle.conf -d 3 >> /var/log/wwwoffle.log 2>&1 &

This will start WWWOFFLE and direct all messages to /var/log/wwwoffle.log.
The -d 3 option sets the level of logging, this is equal to
log-level=important in the config file. You can make the number smaller for
less logging or bigger (up to 6) for too much logging.

If you do this then you will need to have some way of rotating the log file
so that it does not grow uncontrollably. Also doing this will keep sending
the error messages to syslog as well as to the new logfile. You may want to
reduce the level of reporting in syslog to log-level=warning so that only
important messages are reported.

Feedback

If you have any suggestions about what you want to see here or have any
ideas that you want to pass on to other WWWOFFLE users then you can use the
normal Feedback Form.

     [Using DontGet Entries] [Using CensorHeader] [Using Online Options]
[Using Purge Options] [Making WWWOFFLE Run Automatically] [Not Using Syslog]

                   --------------------------------------

           [Users Page] [Browser Page] [Win32 Page] [Changes Page]
 [Hints And Tips Page] [Download Page] [Win32 Download Page] [Feedback Page]
                                   [FAQ]
                            [WWWOFFLE Homepage]

  ------------------------------------------------------------------------
 Andrew M. Bishop = amb@gedanken.demon.co.uk Saturday 18 November 2000
