# stic-0.2
http://stic.webframe.org            stic-source@gmx.net          Thomas Schmitt
http://stic.sourceforge.net



                  Some Tools for Image Collectors 
                               stic 

stic bundles a few Linux tools which are intended to support the task of
collecting an unreasonable amount of pictures (preferrably in JPEG format). 

similar 
     a program for detecting duplicate or very similar images. It maintains a
     database of characteristic color samples which it compares with submitted
     pictures. similar depends on libjpeg and ImageMagick 's convert (on a
     modern Linux desktop system these components should already be present).
     It contains the communications module described at sagent. 

simv 
     a core program to perform file management tasks on an image collection.
     Its main purpose is to coordinate file movements with the content of
     similar's database. This applies to importing new files which get tested
     against the existent collection, as well as to inform similar about moving
     and deleting files within the collection.
     simv depends on an external image viewer like ImageMagick 's display
     (should already be present on a modern Linux desktop system) or John
     Bradley's xv (quite a fast one). It contains the communications module
     described at sagent. 

sagent 
     a standalone version of the communications module used in simv and
     similar. This software receives input from its start terminal and multiple
     clients, distributes several types of output back to them, and is also
     able to act itself as such a client. 
     Since communications mainly use TCP/IP there is an encryption layer
     (Blowfish with 128 bit keys) which provides user authentication. Any
     single activity of such a user may be particularly permitted or denied.
     Secure connections should be possible that way as long as one can defend
     the keyfiles and programs on client and server host against foreign
     access. 
     Front-end connection software is available in C, Tcl/Tk and PHP3 to build
     custom clients. In the most primitive case even telnet can act as a
     client. 
     The standalone program sagent may be used as communications node in a tree
     of clients. Another purpose is to be a shell frontend which sends commands
     to a server and receives its replies. 

snntpbatch 
     a command line based NNTP (newsgroups) client. It is mainly intended for
     automatic download of images by use of a filter language. Nevertheless it
     also downloads the message texts and converts them to HTML code which
     includes the downloaded images. Also, it is capable of automatically
     posting sets of images to the newsgroups. 

The tools are designed to be very independent of the system flavor. On an
average Linux desktop there should be no need to update existing system
components. Actually one could use stic without having display equipment for
graphics. 
Any program activity which is possible in dialog may also be performed in
batch runs. Therefore the tools are quite suitable for users who like to get
boring tasks automated and manual tasks simplified. 

All tools' code is open source and distributed under BSD license. 
Example images Credit: U. S. Fish and Wildlife Service (see images/CREDITS)


                               Getting Started

                          Compilation and Installation

This is only tested on Linux with Intel processors. I confess that i need
to learn about portable distribution of software. About expected portability
problems read the section "Portability Issues" below.

The commands shown in this demonstration will create some files in your
home directory (e.g. $HOME/.stic_main_dir , $HOME/imagelist).
Those file's names will be obvious in the command lines or pointed out
in their explanations.
Read the commands carefully and be sure to understand their general effects
regarding the shell. If in doubt do not hesitate to ask stic-source@gmx.net .


To unpack the tarball go to a directory which is suitable for creating a
subdirectory stic-0.2 with finally about 10 MB (it will grow). For simplicity
it is assumed that the tarball is stored there too.

   $ tar xvzf stic-0.2.tar.gz

Enter the newly created directory and compile the C sources

   $ cd stic-0.2
   $ ( cd src/stic_build ; make )

You should now have some programs in subdirectory bin .
Like that :

   $ ls -l bin
   -rw-r--r--   1 thomas   thomas          0 Feb 23 11:32 just_a_placeholder
   -rwxr-xr-x   1 thomas   thomas     552702 Feb 23 11:37 sagent
   -rwxr-xr-x   1 thomas   thomas     276926 Feb 23 11:37 sfrontend
   -rwxr-xr-x   1 thomas   thomas     736669 Feb 23 11:37 similar
   -rwxr-xr-x   1 thomas   thomas     675203 Feb 23 11:37 simv
   -rwxr-xr-x   1 thomas   thomas     479538 Feb 23 11:37 snntpbatch

For the moment, you may add the subdirectory scripts and maybe bin to 
your shell's PATH variable. Also you should publish the stic-directory
in a small file in your $HOME directory.

   $ pwd >$HOME/.stic_main_dir
   $ PATH="$PATH:$(cat $HOME/.stic_main_dir)/scripts"

If your system does not support symbolic links, do also
   $ PATH="$PATH:$(cat $HOME/.stic_main_dir)/bin"

Many of the scripts expect the binary programs in subdirectory bin and
their helper scripts in subdirectory scripts.
If you decide to move the programs to other places, it is best to create
a link or a copy, rather than removing the program file from bin.

If you are willing to leave the stic stuff where it is and to add this
location to the PATH of every newly started shell, then edit the startup
script of the shell and add at the end :
   $ vi $HOME/.bashrc
...
PATH="$PATH:$(cat $HOME/.stic_main_dir)/scripts"

For other ways to make stic accessible, see below: "File Locations".


We will now test similar, adjust it to daily usage, test simv, adjust it
to your collection, and have a look at snntpbatch. If you are mainly
interested in the news client, skip to "Getting started with snntpbatch".


                             Testing similar

For all runs of similar, set a database name by writing it into the
startup file. Choose a suitable one of your own.
(Of course the directory given by the path has to exist already.)
In this example as a suitable path to insert i will use : home/thomas/test 

   $ vi $HOME/.similar_rc
# Set standard database 
-mapname:/INSERT_SUITABLE_PATH_HERE/similar_map

Note that no white space is allowed at the beginning of the file lines.
Now you may register the pictures of the sample collection that came with stic.

   $ cd $(cat $HOME/.stic_main_dir)
   $ similar -append_nondash:on images/*/*
   append to imgmap : /home/thomas/test/stic-0.2/images/birds/00000055.jpg
   ...
   append to imgmap : /home/thomas/test/stic-0.2/images/images/sea/00000019.jpg
   $ wc -c /INSERT_SUITABLE_PATH_HERE/similar_map*
       440 /home/thomas/test/similar_map_nam
      1008 /home/thomas/test/similar_map_smp16

Let's try wether 00000010.jpg recognizes itself :

   $ display images/birds/00000010.jpg &
   $ similar -search_self_too -search:images/birds/00000010.jpg
   /home/thomas/test/stic-0.2/images/birds/00000010.jpg

Now make a copy of that image and scale it by help of convert.

   $ convert -geometry 640x480 images/birds/00000010.jpg \
                               images/import/eagle_X.jpg
   $ display images/import/eagle_X.jpg &

Look wether it is still recognizable :

   $ similar -search:images/import/eagle_X.jpg
   /home/thomas/test/stic-0.2/images/birds/00000010.jpg

Shoot at the bird (i'm nice to animals and bad in aiming): 

   $ convert -geometry 640x480 \
             -pen red \
             -draw 'fillCircle 250,350 250,360' \
             -draw 'fillCircle 550,230 550,240' \
             images/birds/00000010.jpg images/import/eagle_X.jpg
   $ display images/import/eagle_X.jpg &
   $ similar -search:images/import/eagle_X.jpg
   /home/thomas/test/stic-0.2/images/birds/00000010.jpg

similar has its limits :

   $ convert -geometry 640x480 \
             -pen red \
             -draw 'fillCircle 250,350 250,360' \
             -draw 'fillCircle 550,230 550,240' \
             -draw 'fillCircle 100,200 100,210' \
             -draw 'fillCircle 150,400 150,380' \
             -draw 'fillCircle 300,80  300,40' \
             -draw 'fillCircle 600,300 600,260' \
             images/birds/00000010.jpg images/import/eagle_X.jpg
   $ display images/import/eagle_X.jpg &
   $ similar -search:images/import/eagle_X.jpg

But one may increase the tolerance :

   $ similar -match_par:8:1:6:2:color_diff -search:images/import/eagle_X.jpg
   /home/thomas/test/stic-0.2/images/birds/00000010.jpg

Now probably it's time for you to read  doc/similar_helptext  or execute :

   $ similar -help | less

Try to get an overview of the commands specific to similar.
You may stop reading at "Command reference of agent commands" and use the rest
of the text for reference purposes on demand.


                      Practical Usage of similar

First prepare for a dedicated server process which guards the database files
and avoids concurrency problems.

Therefore create the directory for the keyfile which is used for encrypted
TCP/IP communications (so you do not have to rely on your firewall).
Also write at least 64 non-obvious characters into the file  me.tnl :

   $ mkdir $HOME/tnl_keydir
   $ vi $HOME/tnl_keydir/me.tnl
Ahjoe9h 3tugo dfjhieruhyoperui pherioauyjod zhuiperhl kjdfzkl hjidr hjdlf
sgkhgier hldkhbjdfojlnm,gnklfdjbnkdfnd;fkl ndmn; d;lndjf;hjdfjbblkdkdx
  
Convert that file into a usable keyfile with a content quite hard to guess :

   $ similar -no_rc -make_userkey:all:me
   userkey file created : /home/thomas/tnl_keydir/me.tnl

Edit the startup file $HOME/.similar_rc so it finally contains the following
lines (example for INSERT_SUITABLE_PATH_HERE : home/thomas/image_collection ):

   $ rm $HOME/.similar_rc
   $ vi $HOME/.similar_rc
# Identify at servers as user "me". All communication will be encrypted.
# There has to be a keyfile  $HOME/tnl_keydir/me.tnl  (see -make_userkey)
-security:tcp:user:me

# Set standard database (use a suitable directory path of your own)
-mapname:/INSERT_SUITABLE_PATH_HERE/similar_map

# Allow connection to guarded map servers
-auto_client:on

Note again that no white space is allowed at the beginning of the file lines.


In order to register your whole image collection, you should write the image
file addresses into a text file and then use  similar -append_list .
For example let find make a list of files within your collection directories:

   $ find INSERT_DIRECTORY_LIST_HERE \
          -type f -and -not -empty -print >$HOME/imagelist
   $ similar -append_list:$HOME/imagelist

Beware: depending on the size of the file list, this may take a long time.
        With large collections it may be better to append several smaller list
        than a single big one.

Also you should mark the directories containing registered files by the
file  .imv_guarded_directory . Best is if either all files in a directory
are registered images or none of them.
   $ for i in INSERT_DIRECTORY_LIST_HERE
   > do 
   >   touch $i/.imv_guarded_directory
   > done 


Start the dedicated server process, which will guard the database.
This should be done quite early each time the computer is started.
The process is not intended to run as a demon. It should have a terminal
window for output and administration. (Don't forget to append PATH if
this is not done automatically yet.)

   $ similar_server

If you are short of virtual memory and your database files became quite
large, then you may have to disable the memory cache. That will slow down
the server substantially, especially if RAM is short. The cache's memory
consumption is about 150% of the total file size.
   $  similar_server -similar_server_no_cache

The server started by this script demands user authentication by encryption
and allows all operations but shell commands or ending the service. We only
defined one single user and its identity is ensured by the fact that server
and client use the same keyfile.

If they are not run by the same user on the same host, then one has to
install a pair of matching keyfiles at both of their key directories.
(see "Appendix agent C : Authentication and Encryption" in similar -help)

Caution: the sagent module in similar is able to start shell commands if
         this is not denied (e.g -permission:all:shell:deny ). The shell
         commands are run with the system user id of the server's start user.

         similar_server permits shell commands only for the internal_user
         (the reserved user name of the start terminal).
         But be careful when you change the example server scripts and then
         permit server access to other system users.
         For real world security demands, one should test wether the denials
         actually work before one relies on them.
         (Hey, also read the "I'M NOT TO BLAME" part of stic's BSD license)

To shut down the server process, go to its terminal, hit the
    @
key and enter (without any leading blanks) the command :

   -end

If the server process crashes or gets killed harshly, then it may be
necessary to remove the file  {mapname}_pubno  which publishes the server's
TCP/IP address. With catchable signals it should clean up neatly, though.
Anyway such an abortion of the program might lead to an inconsistent
state of the database if it occurs during a write operation.
(The time window for such an accident is quite small. Only the garbage
collection spends a substantial percentage of its time on writing.)


If you want to know the names of potentially duplicate files within your
collection, you may start a cross check. Set the parameters for a more
tolerant comparison. This may lead to some false duplicates but will
yield better results with color manipulated images.

   $ > $HOME/list_of_possible_duplicates
   $ similar -match_par:8:1:4:2:color_double_diff -list_doubles \
             >>$HOME/list_of_possible_duplicates
       0 : 0         -----------#--------------##--------------#-------
      50 : 4         -----#------------------------------#-------------
     100 : 6 ...

Each "-" represents a unique file, while a "#" shows that a suspected duplicate
has been found. 
This might last quite a long time and blocks the server for other clients.
Also, due to bash's i/o behavior, the resulting file names do not show up
in the result file line by line but only in larger blocks.
If you get curious or need to interrupt the operation for some other reason,
then touch the file  {mapname}_break :

   $ touch /INSERT_SUITABLE_PATH_HERE/similar_map_break

The current search position gets written to a file  {mapname}_list_doubles_pos 
You may resume that cross check by simply starting it again:

   $ similar -match_par:8:1:4:2:color_double_diff -list_doubles \
             >>$HOME/list_of_possible_duplicates

But be aware that starting it after it was completed, means to restart it from
the beginning again.

Each of the resulting filenames may be used as argument to the script
similar_display which shows the possible duplicates together with the given
image file (see below). 
Is is much more efficient though to use it as input of the program  simv .
In any case, set  -match_par:  as it was set during the -list_doubles run.

Duplicates which shall not be reported again in future cross checks, may be
marked by the command -distinguish of similar.

After you cleaned up, you should test any new image file for duplicates before
you move it into your collection and register it with similar. This is best
done by use of the program simv. Default setting -match_par is recommended to  
decrease the number of false doubles in that dialog situation.

From time to time, one should check the registered collection with a
-list_doubles run that employs the laxer -match_par settings.
One should also do a -list_doubles with -match_par:8:1:6:2:color_diff to
find things like the heavily molested eagle_X.jpg . Generally, experimental
parameters are encouraged :)


There are a few scripts which ease several tasks around similar.
They all assume, that directories with registered files are marked by the
existence of the file  .imv_guarded_directory  and that all files therein
are registered.

similar_server
   to avoid any concurrency problems, start a dedicated server process which
   guards the database files. All other instances of similar which try to
   access that database will automatically connect to the server process,
   provided the above preparations have been made.
   The server process will load all database records into its virtual memory
   to increase the speed of searching processes.
   
similar_append file [file ...]
   add one or more files to the database.

similar_del file [file ...]
   delete one or more files from the database. 
   The given files have still to exist and to match their registered sample.
   Also, the registered address of the sample has to point to a file with the
   same byte content as the given file.
   By these rules, the deletion is not hampered by alternative file addresses
   caused by symbolic links or changing mount points.
   (There is a similar command -del_file_adr which relies on the exact file
    address and does not need to have access to the image file itself.)

similar_refresh file
   recompute the sample value for a file address which is already registered
   in the database. Her the exact file address needs to be given.
   (There is a similar command -lookup_adr which may be used to determine the
    registered address of an existing file, before its content is altered.)

similar_rm file [file ..]
   remove one or more image files from disk and database.

similar_mv
   move a single image file on the disk and make the necessary changes in the
   database. Wether the image file's sample is removed from or added to the
   database depends on the existence of the file .imv_guarded_directory within
   the source and target directories.

similar_display
   search duplicates of a single image file within the database.
   If duplicates are found then they are diplayed by Image Magick display.
   The original file is appended as last one to display's file list.
   (Use space bar and backspace key to flip through the list, key "^Q" quits)
 
similar_xv
   same as similar_display but using xv rather than Image Magick display.
   (Use space bar and backspace key to flip through the list, key "q" quits)



                          Getting started with simv

There are three tasks around similar which can become quite time consuming
if one has a substantial collection with a substantial input of new pictures.
These are : 
- cleaning up the collection according to the results of a -list_doubles run.
- checking newly downloaded files for duplicates and fitting them into the
  collection.
- moving registered files within the collection respectively deleting them
  from the collection.

simv shows you what you are doing, single keystrokes are enough to initiate
actions, and there is the possibility to attach GUI components al gusto
(my taste is quite frugal when it comes to GUIs).

The price for that convenient and quick user input is some effort for
the configuration of simv's user interfaces. 

A sample configuration is prepared for the mini collection of images that
come with the stic tarball. To test it, first edit the file $HOME/.simv_rc
and set the address of the mini collection's database.
(I replaced INSERT_SUITABLE_PATH_HERE by home/thomas/test in the above example)

   $ vi $HOME/.simv_rc
# the database to use with simv (same address as in first similar examples)
-mapper_cmd:similar -no_rc -mapname:/INSERT_SUITABLE_PATH_HERE/similar_map::

This may be necessary if you already have changed the database address in
$HOME/.similar_rc . The lengthy command redefines the way, similar is called
(especially the :: at the end is essential). Thus it avoids bothering the
guarding server of your real image data base.

Now for something very important:

Avoid a serious problem with the command display. If display uses shared
memory and simv does kill it to remove it from the screen, then display
leaves two shared memory segments undestroyed. This feature may choke your
system after a while (found with ImageMagick 5.1.0 and 5.3.0).
So disable shared memory usage by 

   $ echo display.sharedMemory: False | xrdb -merge -

Shared memory may be monitored by command ipcs and released by ipcrm .
You may want to add the resource definition to one of the various X resource
configuration files mentioned in your X-user's startup file .xinitrc .
An oldfashioned place would be $HOME/.Xdefaults of that user.
   $ vi $HOME/.Xdefaults
...
display.sharedMemory: False

Alternatively you may want to use a patched version of xv as described in
file  doc/xv_changes .

Whatever, let us go on with the exploration of simv :

   $ cd $(cat $HOME/.stic_main_dir)/images
   $ convert -geometry 640x480 birds/00000010.jpg \
                               import/eagle_X.jpg

This produces a duplicate but non-identical image in the directory import.
Now start the simv example :

   $ simv_start -end_on:list_end:clear */*

   --      1 : 10     ---------------------------------------------------------
   -rw-r--r--   1 thomas   thomas      10374 Feb  6 18:14 birds/00000010.jpg

There should be an image window at the upper left corner of your screen now.

Keep your cursor in the terminal window where you started simv. Not in the
graphics window.
Hit the space bar and you will get shown the next image. Hit Backspace
or the "\" key and you will get the previous image.
Usually this script ends when you try to hop behind the last item. The lenghty
-end_on command disabled that feature for now.

Hit space until  import/eagle_X.jpg  (which we created above) appears.

   --      7 : 10     ---------------------------------------------------------
   -rw-r--r--   1 thomas   thomas      32575 Feb 24 22:33 import/eagle_X.jpg
   =?=?=?=?=?=?=?=?=?=?=?=?=?=?=?=
   -rw-r--r--   1 thomas   thomas      10374 Feb  6 18:14 birds/00000010.jpg

There should also be another image window at the upper right corner of your
screen now. It shows the registered image  00000010.jpg  which is considered
to be a duplicate of eagle_X.jpg . 

In a real life import situation, you would now have to decide wether to keep
eagle_X.jpg or 00000010.jpg or even both of them.
Keys you may hit now:
   -   Moves eagle_X.jpg to trashdir and makes the next image file current item
   ~   Moves 00000010.jpg to trashdir and deletes it from similar's database
   b   Moves eagle_X.jpg to birds and registers it in similar's database.
       If you do this, you will get both files in the result list of the
       next cross check.
   ?   Prints a sparse list of which key is bound to what move target or other
       command. "sUmo" means that "u" will move a file to directory sumo.

Hit one of these keys and watch what happens. 
Hit the ":" key to undo your actions one by one.
Hit ESC if you leaned on the Space bar and auto repat filled the queue.

Quick. But also hard as your first ride with vi.
Now, if you got the Tcl/Tk shell wish, you may add a GUI frontend.

Stop the running simv by hitting
   @
 and entering
   -end
Start the script again with an additional option.

   $ simv_start -tcltk -end_on:list_end:clear */*

Note, that -tcltk is interpreted by the script and is not a simv command.
The script creates a named pipe, causes simv to use that pipe as input and
to start a wish process, which creates another named pipe and causes simv
to use that pipe as an output channel.

There should be a Tk window with some buttons and a file list at the upper
right corner of your screen. It is quite obstinate in popping up if it gets
covered (and poorly coordinated with my window manager, i fear).
Use the "auto popup" check button (near lower right) to toggle this feature.
Use the button "dip" in the same row to lower the window for some seconds.

The buttons in the upper part mainly trigger move commands to the several
directories of the sample collection.

"del" works like key "-" and "del_double" works like key "~".
"convert" converts the current item into a JPEG of quality 80 . A backup
copy made of the original is made in the trash directory, so this conversion
can be undone despite it isn't reversible by itself. 
"known" moves the current item to the doubles directory. This is a kind of
second trashdir which may help you to distinguish trash from duplicates
before you delete it.

As long as the focus is on the input field labeled "current:", key strokes
of printable keys will be forwarded to simv and work according to the key
mapping described above.

In the file list, Left-Double-Click a filename to make it the current 
and selected item.
Left-Click selects an item. Right-Click adjusts the selection.
Any move command is applied to all files of the selection. If there is no
selection, then the command is applied to simv's current item.

As soon as the Tcl script receives notifications about a change of the current
item, it automatically selects its line in the list if there isn't already
a selection of more than one line. This may eventually interfere with your
asynchronous selection activities in the file list. Toggle this feature by
use of the button "auto select". 

"xv beep" is useful if you employ a patched version of xv as image viewer
program (see text doc/xv_changes). It creates or deletes the file
$HOME/.xv.gefummelt.beep .
"combine parts" starts a script that tries to reunite multipart files
downloaded by snntpbatch.
Button "auto convert" causes convert to be run before any move to a regular
target directory. This does not apply to "del", "del_double" or "known".

One may adjust the delays for "dip" and "auto popup". Beware of two frontends
with "auto popup delay" set to 0 and fighting over visibility. An X server
might get stuck over that highspeed popping.

"undo" works like key ":" and revokes reversible commands. 
"delete" is like the "del" button.

As for the last row of buttons, "stop" works like the ESC key. "help" prints
the complete help text flatly to the terminal (one should pop up a text widget).
"close" ends the Tcl script but not simv. "end simv" ends both.


Now you may use both the Tk window and the start terminal as input device.
If you use the terminal, then the Tk filelist is kept up to date anyway.

Play a while, undo all actions and end the program. 
Use the "end simv" button or go to the terminal window and use "@" "-end" .

Now see a targeting method which is more easy to modify than the buttons.

   $ cd $(cat $HOME/.stic_main_dir)
   $ ls -d $(pwd)/images/[a-z]* > scripts/image_targets
   $ echo $(pwd)/images/trashdir > scripts/image_trash

The reason for this is explained below at "Practical Usage of simv".
Start a new run

   $ cd images
   $ simv_start -tcltk -end_on:list_end:clear */*

There are no target buttons at the top of the window but three identical
lists of directory addresses. Click on any of the addresses with any of the
mouse buttons to issue a move command.

Try it, undo and end the program by the "end simv" button.


simv does not finally remove files, it just copies them to other directories.
Deleted files are copied to the trash directories trashdir and doubles.
You have to remove the files in those directories by yourself if you want
to get rid of them.

At least remove eventual convert backups from the trashdir.

   $ rm trashdir/_imvjpgb_*

Before you begin with your own target list, delete the lines from
 $HOME/.simv_rc which connected simv to the test database.  Let it work with
the default database as defined in $HOME/.similar_rc which you already used
to register your collection.
As long as you use Image Magick display as your image viewer program there
is not much need to have any commands in $HOME/.simv_rc .

   $ echo '# write global start commands for simv here' > $HOME/.simv_rc


                         Practical Usage of simv

The most simple way to adapt simv to your own directory structure is to
deposit move targets in prepared file locations and to copy the minimal
keyset definition file to the expected location.

DO NOT FORGET above  echo '# write ...' > $HOME/.simv_rc .

The expected file locations are all in the scripts directory :
   simv_keyset ...... the bindings for single keystrokes
   image_targets .... a list of directories which may be move targets
   image_trash ...... a single line with the address of the trash directory

So activate the minimal keyset which contains no specific mover keys

   $ cd $(cat $HOME/.stic_main_dir)/scripts
   $ cp simv_keyset_minimal simv_keyset

Decide where to have your trash directory.
It should be cleaned out from time to time.

   $ echo INSERT_YOUR_TRASHDIR_HERE > image_trash

Make a list of all desired move targets (absolute directory addresses)

   $ vi image_targets

and test the result

   $ cd $(cat $HOME/.stic_main_dir)/images
   $ simv_start -tcltk */*

You may provide up to four different target lists within the files
   image_targets_1  image_targets_2  image_targets_3  image_targets_4
of which _1 to _3 eventually override the lists defined by file
   image_targets
See in script simv_start, variable target_list for a complete description.

It is a good idea to split a long list in several parts or to provide
rotated versions of the big list as _1 _2 _3.

If you want specific mover keys and a visible button menu see below
"Adapting key bindings and button menu of simv" .



               Handling the result of similar -list_doubles

After cross checking the whole collection by similar, one may want to review
the reported files and decide what to do with them. Certain commands may be
helpful in this scpecial situation.
Since a listed file may reside in an .imv_guarded_directory , force the
check for double files by  -double_check:on .
Since the list may be very long, better use -addfile_list: rather than
giving the list as arguments by $(cat ) .

Also be sure that similar's -match_par: are set to the same values as with the
-list_doubles run.
Since you may want to mark similar images as distinguished and also want to
experience the effect of that distinction, set  -use_exemptlist:on .

So let a similar instance tell the server :

   $ similar -match_par:8:1:4:2:color_double_diff -use_exemptlist:on

and start simv:

   $ simv_start -tcltk -double_check:on \
                -addfile_list:$HOME/list_of_possible_duplicates

Now simv should show you the first file to the left and its alleged doubles to
the right. Eventually use Spacebar and Backspace inside the right image window
to view all doubles if more than one is reported.
Keys you may hit now:
   =   Moves left image to directory doubles, deletes it from similar's
       database and makes the next image file current item.
       Button "known"
   +   Moves first duplicate (at the right) to doubles, deletes it from
       similar's database and makes the next item current. When the deleted
       duplicate is supposed to be current item, it will not be found and
       you get the idle window of ImageMagick respectively xv.
   &   Tells similar that *all* alleged duplicates are not identical to
       the left image. Be careful to see all alleged duplicates and wait
       for a chance to remove any real duplicate before you -distinguish
       the rest.
One may also use key - (Button "del") to delete the left image or key ~
(Button "del double") to delete the first duplicate.

When done, set the match parameters back to the default values.

   $ similar -match_par:4:1:4:2:color_diff



                      Handling newly downloaded files

With a small number of new files, one just starts simv and gives it all the
files' addresses :

   $ simv_start -tcltk *

With a larger bunch of new images it may be desirable to have them categorized
before one uses simv to check them in.

stic_importer is a script that will show you four categories of images in
separate runs of  simv -tcltk :
  unique ........ images with no duplicates found
  interesting ... images which are larger than the duplicates found
  equivalent .... images that are equally sized or slightely smaller
  problem ....... files which are supposed to be images but unreadable
                  (one may try different graphics software like netscape)

Three other categories get removed without user interaction:
  inferior ...... duplicates significantly smaller than the registered one
  trash ......... very small images
  unreadable .... non-image stuff (text, MPEG, Word, PDF ...)

The categorization is done by script similar_splitter which not only detects
duplicates with the registered collection but also uses a temporary similar
map to detect similarities among the newly imported files.
See in similar_splitter "adjustable parameters" for the criteria used.

The importer will not process files that end with .tee . Those may have been
created by snntpbatch and may be needed to produce complete images.

The usage of stic_importer is simple. Just give it some file addresses
but be sure that these aren't your valuable private files :

   $ stic_importer *
   ++-=++--+-+-=------++++-+======+++++++++++++++++++  50
   +++++++++++++++++-++=++=++++++++++++++++++++++++-+  100
   ++++++++++++++++++++++-+++++++++++++++++++++??+-++  150
   ++++++++++++++++-++++++++++++++---++++++++=++++-++  200
   +++++++++++++++++++++++++++++=++----+++++++++++
        12 similar_split_equivalent
         1 similar_split_inferior
         2 similar_split_interesting
         2 similar_split_problem
         3 similar_split_trash
       206 similar_split_unique
        21 similar_split_unreadable
       247 total
   start processing of result ?

Enter a single "y" and press Return to start the first simv run. The other
runs will eventually follow as soon as the previous one has ended.

The simv run which presents the unique images will not check again for
duplicates. This speeds up simv but be aware of other users who are
registering files in the time between stic_importer's start and the actual
run of simv.


                      Other Types of Frontend Clients

There is also a PHP3 script simv_frontend.php3 for use with a webserver
(like Apache) and a webrowser (like Netscape). It is not intended for moving
files but only for viewing and checking for duplicates.
Just a programming demonstration and an example to build on.

The webserver needs to be running already. In the following examples, i assume
that its documents are located under /usr/local/httpd/htdocs .

In order to activate the PHP3 script, copy it in reach of your webserver and
also copy scripts/slate.gif and scripts/penguin.gif to the same directory as
the PHP3 script.
For the correct setting of permissions ask your system administrator resp.
the webmaster. I.e in most cases: ask yourself and try out.

   $ su
   # mkdir /usr/local/httpd/htdocs/stic
   # chown INSERT_YOUR_SYSTEM_USER_ID_HERE /usr/local/httpd/htdocs/stic
   # chgrp INSERT_YOUR_SYSTEM_GROUP_ID_HERE /usr/local/httpd/htdocs/stic
   # exit
   $ chmod a+rx,o-w /usr/local/httpd/htdocs/stic
   $ mkdir /usr/local/httpd/htdocs/stic/tmp
   $ chmod a+rx,o-w /usr/local/httpd/htdocs/stic/tmp
   $ cd $(cat $HOME/.stic_main_dir)
   $ cp scripts/simv_frontend.php3 \
        /usr/local/httpd/htdocs/stic/my_simv_frontend.php3
   $ cp scripts/slate.gif scripts/penguin.gif /usr/local/httpd/htdocs/stic

Copy scripts/simv_info_server to scripts/my_simv_info_server

   $ cp scripts/simv_info_server scripts/my_simv_info_server

Edit the copied PHP3 script and set variable $stic_dir to your stic-0.2
directory. If port number 4000 is not suitable for the connection to the simv
server, then set variable $simv_port to a new one . In that case you will
have to change the variable simv_port in my_simv_info_server accordingly. 

   $ vi /usr/local/httpd/htdocs/stic/my_simv_frontend.php3
...
# The installation directory of stic
   $stic_dir= "/INSERT_YOUR_PATH_HERE/stic-0.2";

# The address of the simv server to connect with
   $simv_host= "localhost";
   $simv_port= 4000;

Only if you do not install the PHP3 script in  /usr/local/httpd/htdocs/stic
or do not use port 4000 you have to edit the shell script my_simv_info_server
and set the variables  workdir  and  simv_port  accordingly:
   $ vi scripts/my_simv_info_server
# The directory in which the PHP3 frontend script resides
workdir=INSERT_PHP3_INSTALL_DIRECTORY_HERE
# the portnumber where to provide service. This must be the same as in the
# variable $simv_port in the PHP3 frontend script
simv_port=INSERT_YOUR_PORTNUMBER_HERE


Start the simv server in an own terminal window and give it some image files
to display. For this example, force duplicate check even for files in
guarded directories.

   $ cd SOME_DIRECTORY_WITH_IMAGES
   $ my_simv_info_server -double_check:on *

Now enter the URL of the PHP3 script at your web browser. 

   http://localhost/stic/my_simv_frontend.php3

A login request will appear. User ID = "me" , Password = "mypwd" .

The webserver is not accessing the images directly but gets copies which
are requested by a sagent process started by the PHP3 script. This sagent
process contacts the simv server which has permission to access the images
and makes copies in the reach of the web server. A copy is guaranteed to
exist for at least 60 seconds even if the simv server immediately hops to
another item in its file list.


Despite simv_info_server is configured to permit only harmless commands
and also refuses connections which seem not to come from localhost, one
may be even more cautious and also demand encryption.
Obviously the user id for the web server should not be the same as the
user id for the trustworthy clients of similar.

So make a keyfile for user id webserver :

   $ cd $(cat $HOME/.stic_main_dir)
   $ vi $HOME/tnl_keydir/webserver.tnl
fnhndfont mkgfcno;dfn;odjaiug ewjijxzknsiuh9ldfm ji gdk;j xj joj bxjojojo
8q765v0p80-bi09y450u8ex907du7v09 998908nr8908un 8du08760 9sd;ig98987dld
   $ simv -no_rc -make_userkey:all:webserver -end
   userkey file created : /home/thomas/tnl_keydir/webserver.tnl

Copy it to the stic-0.2 directory and make it readable for the Apache user
(look who is running your httpd, mine is run by  wwwrun) :

   $ cp $HOME/tnl_keydir/webserver.tnl .
   $ su
   # chown INSERT_WEBSERVER_USER_ID_HERE webserver.tnl
   # chmod ug+r,o-r,g+w,uo-w webserver.tnl
   # exit

Now you have to edit  my_simv_frontend.php3  and set the variable
$simv_encryption to "on" . The other variables should then match the above
preparations and the webbrowser should behave with that URL as before.

   $ vi /usr/local/httpd/htdocs/stic/my_simv_frontend.php3
# Encryption is used if this variable is not set to "off"
  $simv_encryption= "on";

Finally stop the simv server and restart it with an additional argument
that demands encryption (and not only allows it).

   @
   -end
   $ my_simv_info_server -security:tcp:tunnel:require -double_check:on *

You may do the reverse test and set $simv_encryption= "off" in the PHP3
script. Afterwards the simv server should complain and the frontend should
send pages to the browser which are quite empty.


                       Other Programming Languages

Since the protocols for input and output are fully documented in the
help texts of similar, simv and sagent, a programmer should be able to
build a frontend client in any desired language.
See  src/as/asfrontend.c , scripts/simv_frontend.tcltk and 
scripts/simv_frontend.php3 for implementation examples.



                      Getting started with snntpbatch

snntpbatch is intended to be controled by shell scripts. Nevertheless, one
may perform all operations in shell dialog.
The basic idea is to have a directory for each group where the messages
get stored and therein a directory bin, where attachments get stored.

You will need some free disk space. A dozen busy groups may easily occupy
500 MB. A full GB of working storage is advised.
You will also need some i-nodes (expect hundredthousands of files).

But first have a look at the brief introduction to snntpbatch :
see "Getting Started" in text  doc/snntpbatch_helptext  or by executing :

   $ snntpbatch -help | less

One may view the messages with a web browser starting at a main index page.
For an image collector, direct operation of simv and similar on the files
in directory ./bin may be more interesting. One should avoid the possible
multi-part files *.tee (as well as the usual virus.exe). 
Matching the helptext download examples :

   $ cd $HOME/snntpbatch_download/YOUR_NEWS_SERVER.port119
   $ cd alt.binaries.pictures.fantasy-sci-fi/bin
   $ simv_start -tcltk *[!e]


If you already have favorite newsgroups then you should think about how
to recognize the wanted messages by means of regular expressions applied
to the header lines subject: , from: , date: ,  by logical operators
 -and , -or , -not and by brackets. (see -filter in helptext)

snntpbatch's filter may look into your collection directories and into own
hash directories to avoid double downloading of the same binary files.
This feature depends on sufficiently unique filenames and their unambigous
announcement in the subject: . Good for use in serious collectors groups
with not too many Mac file names (shaking my head silently).

Generally, care is taken that no dangerous or confusing filenames emerge.
Space characters are converted to underscores. Any unusual character
is replaced by its hex code. The file names should be shell-safe then.

The download directory also contains a global list of message ids.
If not explicitely disabled by  -overwrite on , a message is not downloaded
if it is already known to that list. 
Like all downloaded data, those list entries may be cleaned off the disk
after a certain time (usually one week).

The routes are bumpy, especially if one uses remote commercial servers.
The usual timeout is set to 4 minutes and the program tries to reconnect
and resume its tasks after a connection breaks down.
Nevertheless there are situations where it has to abort.

When confronted with the special server behavior of delivering much less
bytes than announced, it tries to circumvent the problem by aborting the
connection and waiting a random time before reconnecting.
After five such glitches, a message is discarded and the next one is
processed. This happens quite rarely, though.

Multi-part messages (i.e. split attachments, not the MIME multipart type)
are handled in a rather coarse way, if ever. I would like to know wether
there is an inofficial protocol of the automated Windows news clients how
this is to be announced in the subject line. "(5/23)" may mean anything
from 5th part of a 23 part message to "hooray it's 23rd of may !".

To enable multi-part handling, the group directory has to contain a file
   nntpclient_multi_bin_tee
The first part of an attachment is stored as decoded binary file.
If the group directory contains the marker file nntpclient_multi_bin_tee
then possible first and further parts are downloaded undecoded as
*.tee files with names derived from the subject line. 
It remains left to an external tool like scripts/combine_tee to find
matching .tee files and decode them by uudecode. (Up to now, i never
saw split MIME-encoded attachments. Maybe i didn't look sharply enough.)


----------------------------------------------------------------------------


             Adapting key bindings and button menu of simv

To adapt simv fully to your own directory structure, you will have to copy
and modify some files.

   simv_start            coordinates the components of a simv run
 
   simv_keyset_example   defines the key bindings and target directories
                         of the above example

   simv_frontend.tcltk   the GUI component of the above example

Copy the three files :

   $ cd $(cat $HOME/.stic_main_dir)/scripts
   $ cp simv_start my_simv_start
   $ cp simv_keyset_example my_simv_keyset
   $ cp simv_frontend.tcltk my_simv_frontend.tcltk

Edit my_simv_start and set the variables keyset_file and frontend_file to
the names of your copies :

   $ vi my_simv_start
# the name of the file within the stic-scripts directory
keyset_file=my_simv_keyset
...
# the name of the Tck/Tk script within stic-scripts to serve as frontend
frontend_file=my_simv_frontend.tcltk

If you prepare for separated target lists, you may possibly also want to
change the variables target_list and trash_target_file. See their remark texts.


Edit my_simv_keyset . Better read  simv -help | less  before, especially the
description of the command keyset. The file will be read by -keyset:readfrom: 

   $ vi my_simv_keyset

If your collection directories reside below a common main directory, set that
address behind the statement main: e.g. /home/thomas/stic-0.2/images

main:INSERT_COMMON_MAIN_DIRECTORY_HERE

also you should create the directories "trashdir" and "doubles" in that main
directory. So you can use the definitions of trashdir: and the keys below
"Some auxiliary keys" without any changes.

   $ mkdir INSERT_COMMON_MAIN_DIRECTORY_HERE/trashdir
   $ mkdir INSERT_COMMON_MAIN_DIRECTORY_HERE/doubles

To translate your directories from the dummy names of the example, better
make a translation table since you will probably want to change the
Tcl/Tk script accordingly.
Choose the translation with the layout of the Tk window in mind. Translate
from "flowers" the target which shall be on the upper left Tk button,
translate from "crowds" what shall be bound to the upper right button. And
so on.

The letters after "map:" should be choosen as abbreviations of the target
names. Not an easy task. 

I restrict myself to alphabetic letters and number keys (i.e [a-zA-Z0-9])
for the collection's move targets. Choose the letters in the order of
importance (i.e traffic frequency) of the directories. Usually after a
dozen you will have to make some strange choices ... that's life.

Delete those example move targets which you did not translate into targets
of your own. Add new "map:" lines if needed.
 
Finally write the helplines describing your key bindings. You may add or
delete "helpmore:" lines but there must be one "helpstart:" at the beginning.


Edit my_simv_frontend.tcltk . simv_frontend.tcltk is intended as programming
example of a frontend interface as well as a GUI component for practical use.
It surely is helpful if you know some Tcl/Tk but not absolutely necessary.

   $ vi my_simv_frontend.tcltk

Search for "set what_to_show {auto}" and replace that line by
 set what_to_show {targetbox payload}
This makes the button menu visible even if there are target lists.
The targetbox with these lists is shown eventually.

The buttons and their containers are defined in  proc init_payload .
Their names are chosen not to interfere with other names within the script.

There are four main button containers which each consist of two row containers.
 plants , animals , humans , themes 
They should be easy to identify in the visible layout of the example GUI.

The buttons, their label texts and the target directories have the same
names. For example :
 .flowers      is the button's name
 "flowers"     is the label text
 /flowers      is the target directory's name below the main directory of
               the keyset definition

If possible, keep this identity of names. If you need more freedom:
The button's name has to start with a letter and may consist of letters,
numbers and underscores. The label text is quite arbitrary. The target
directory should be a path of short and shell friendly names.

Take your prepared translation table from the keyset translation and
use your editor's text change facilities.

For unused buttons, leave the names as they are, make text " " and
remove -command "..." . See button  .empty  for an example. 
If you want to add buttons or change the layout, you need to know some Tcl/Tk.

Make of these files a backup copy outside the stic directory tree.
Test wether it looks and works like intended :

   $ ( cd ../images ; my_simv_start -tcltk -end_on:list_end:clear */* )

----------------------------------------------------------------------------


                              File Locations

Most of the scripts depend on the binary executables or helper scripts.
This part describes the configurations that will allow them to find each
other. It is a good idea not to remove any file from scripts or bin.
Only make copies or install links.

If the system supports symbolic links, it is sufficient to append the 
subdirectory stic-0.2/scripts to the environment variable PATH. The
binaries are represented in scripts by symbolic links to the sibling
directory bin.
Modification of PATH is done best in the shell's startup file
(e.g. $HOME/.bashrc).

If there is no support for symbolic links, then one would additionaly
have to append the subdirectory stic-0.2/bin to the variable PATH.

If you do not want to change your PATH then you may put copies or links
of the desired commands into one of directories already listed in PATH.

The scripts take quite an effort to find the script stic_std_variables
which then tries to locate the bin and scripts directories of stic . 
One may easily make that search unambigous by writing the complete
path of stic-0.2 into the file $HOME/.stic_main_dir .
This file may be overridden by the environment variable STIC_MAIN_DIR.

If $HOME/.stic_main_dir is missing and $STIC_MAIN_DIR is empty, then
the scripts try to find out the filename that has been used to start
them. If the parent of their directory contains a file named
 this_is_the_stic_main_directory  then they assume to be at their
original position within stic-0.2 .

If no other clue is found, the scripts try to find their helpers in 
the same directory as they were started in.

----------------------------------------------------------------------------


                              Portability Issues

Generally, the source should compile with any 32-bit UNIX C-Compiler that does
not refuse to process K&R code.

Set the compiler commands for your system at the start of
 src/stic_build/makefile .
Also decide what bytesex to use with blowfish.

Trouble spots may be similar.c with its signal handlers Cleanup_handle_xyz()
and sblowfish.c which contains code that i found in the internet.

The handler functions are void and don't touch any argument. That might cause
compilers to complain but should be compatible with any type of function
that is expected by your system's personal flavor or signal() .
Nevertheless, there might be signals mentioned which do not exist on other
operating systems. Vice versa, there might be the need to catch signals not
mentioned yet in similar.c .

sblowfish.c contains some gestures which might cause problems on older
compilers. I'll try to change it to primitive K&R soon. Also i still have
to validate that implementation of blowfish with an artless implementation
of my own. I checked it with B.Schneiers description of December 1993 and
so far it seems to be ok. A remote possibility of a well disguised fake
still remains (security considerations make me temporarily paranoid).

Also it depends on the byte sex of your system. See in sblowfish.h the
macros ORDER_ABCD , ORDER_DCBA , ORDER_BADC .
I will try to remove this dependency from the code (let blowfish
work on a byte array rather than two 32-bit words ?) but verification
will not be that easy without an ABCD workstation.


----------------------------------------------------------------------------


                Where to get supporting software

libjpeg ......... http://www.ijg.org/
ImageMagick ..... http://www.simplesystems.org/ImageMagick/
xv .............. http://www.trilon.com/xv/
Tcl/Tk .......... http://www.tcltk.org/
                  http://sourceforge.net/projects/tcl
PHP ............. http://www.php.net/

A backup tool for CD recorders:
scdbackup ....... http://scdbackup.webframe.org

----------------------------------------------------------------------------


                              Legal Stuff

This software and related documents are copyright 2001, Thomas Schmitt
stic-source@gmx.net and provided to you without any warranty under an open
source BSD license.  (see file COPYING)

