Snag-o-rama v1.4 Release Notes					April 28/96


What it is:
-----------

The snag-o-rama package attempts to transfer an entire html source tree
from a remote URL to the local filesystem.  Starting by downloading the
URL provided on the command line, it downloads all images (and links to
images) it encounters, and tries to follow links to other documents,
recursively.  By default, snag will not follow hypertext links to
different domain names, nor will it follow links to "parent" directories
of the original URL.  Along the way, all links are made *relative*, so as
to remove dependency on where the files have to be. 

The http transfers are done via a spiffy perl script called webget. 
Thanks to Jeffrey Friedl (jfriedl@omron.co.jp) for the script.  If you
have another http grabber program (such as lynx -source) you can specify
its use with the XFERCMD environment variable. 

Perhaps the best way to find out how the program works, and how to apply 
it to your needs, is to try "snag http://www.remote.site/path/start.html" 
and see what happens.  


Documentation:
--------------

The snag-o-rama distribution includes the following files:
snag		shell script that does it all
snagit		helper script; fetches an individual url
chaserefs.c	source of chaserefs; filters incoming html
Makefile	a weak makefile that saves some typing
CHANGES		a summary of changes made since v1.1
COPYING		the Gnu Public License agreement that this is placed under
webget		a perl script that does the job of lynx -source


Usage:
------

	snag <fqurl> [verbose] [noimages] [chaseparents]

<fqurl> is a fully qualified url that you want to start from.

Options:
verbose	 	Tells snag to give a play-by-play of what it's doing to 
		stderr.

noimages	Avoid downloading files with a "gif/jpg/jpeg/xbm" extension. 
		Really handy when you don't care about the images anyway.

chaseparents	Overrides the default action of not downloading URLs that are
		higher up in the directory tree than the starting URL.


Environment variables:
----------------------

XFERCMD		Specifies the command to use when downloading http.  Snag 
		expects this program to write the data to stdout, when
		given a URL as its final argument.  If XFERCMD is not set,
		"webget -q" is used by default.


Notes:
------


Things go into the current directory, so you will want to start in an
empty directory. 

Your http transfer program (by default, webget), snagit, and chaserefs 
should all be visible in the path for things to work properly.

If everything works just right, a html document tree should appear in the
current working directory, replicating approximately what is on the remote
end of things.  All anchor and inline references are translated to point
relative to the referring file, so you don't have to worry about absolute
path names.  This actually alters the html files; the original,
unmutilated html source is left in a file with the extension ".real"
appended to it.  You can probably "rm `find . -name \*.real -print`"
afterwards if you don't have any use for these original files.  I use them
for debugging.  The tilde (~) in any URL is changed to an underscore (_)
to avoid problems with your httpd trying to find the home directories of
users that are on remote machines.  Occurrences of "?" and "&" are 
similarly converted into "-", so your filesystem doesn't get stuffed with 
strange filenames.

Image maps and cgi scripts and anything else resembling interactivivity 
on the remote end won't work.  Bummer.

Documentation is conveniently found in chaserefs.c, snag, and snagit.  
:-).  If there is sufficient interest, I may spiff up this distribution, 
and do a manpage.

The binaries have been compiled on Linux, SCO Unix, and Sun systems; if
you have problems compiling, then my code probably isn't as portable as I
thought it was.  Your shell should be able to handle large-ish scripts; 
I've found that bash works quite nicely. 

Please report as to any successes with this program.  Please also email 
reports of things breaking down, so that I can improve it for future 
generations...  Please don't comment on the sorry state of the source 
code; it is loaded with dead code, counter-intuitive hacks, and stuff 
that's been duplicated.  It sometimes works, though.  :-)

Hope you find it useful.


Joseph Clancy
jclancy@wimsey.com / jclancy@vc.bc.ca


