
                PROTECTOR(1) - 1.00.7 - 18 September 2002

How it works - quick overview

   The protector program reads an email message file and copies it to its
   standard  output  device,  passing  any attachments it finds through a
   filter  program  or  script  in the process. And that is ALL it does -
   dead  simple  really.  All  the  cleverness you might expect from this
   package comes from the logic of the filter.

   The  filter  script is typically a bash shell script (but could be any
   executable  program written in tcl, C, perl - whatever), written to be
   easily  understood  and  extended.  The filter is passed the "encoded"
   attachment contents on it's standard input, and the attachment headers
   in  environment  variables.  It  writes  the  headers and content back
   (possibly  modified)  on its standard output. The bare minimum script,
   to pass all attachments through unmodified looks like this..

        #! /bin/sh
        echo "$HEADERS"
        cat
        exit 0

   With  this  filter  script installed in the right place, the protector
   would  (in effect) do nothing more that copy the message from stdin to
   stdout without making any modifications.

   However, life can get more interesting..

   The following script (for example) replaces all MSWord documents with
   a warning message, but passes all other attachments through unmodified.

        #! /bin/sh
        case "$CONTENT_TYPE" in
                application/msword | application/x-msword )
                        echo "Content-type: text/plain"
                        echo "Content-description: Warning message"
                        echo
                        echo "This email contained an MSword document which"
                        echo "has been replaced by this message."
                        ;;
                * )
                        echo "$HEADERS"
                        cat
                        ;;
        esac
        exit 0

   This is rather naive since we allow everything but MSwords through - a
   better bet would be to be far more "fussy", like this..

        #! /bin/sh
        case "$CONTENT_TYPE" in
                text/* )
                        echo "$HEADERS"
                        cat
                        ;;
                application/msword | application/x-msword )
                        echo "Content-type: text/plain"
                        echo "Content-description: Warning message"
                        echo
                        echo "This email contained an MSword document which"
                        echo "has been replaced by this message."
                        ;;
                * )
                        echo "Content-type: text/plain"
                        echo "Content-description: Warning message"
                        echo
                        echo "This email contains an unrecognised (and"
                        echo "therefore: illegal) attachment which has"
                        echo "been replaced by this message."
                        ;;
        esac
        exit 0

   However,  this  script  still has a major limitation - it assumes that
   the  Content-type  field  in  the  email  correctly  identifies MSword
   documents  as  such. However, if someone wanted to "break out" of this
   prison,  all  he  or  she  need  do is to send the document, but use a
   different  content type - for example "application/octet-stream" would
   do  the trick for most mailers. So, if we are unwilling to trust those
   sending us mail messages we must take some more precautions.

   One  possibility  would be to "decode" the attachment into it's binary
   format,  and  then  inspect  the  resulting  file  to discover what it
   *really*  is  (as  far as we can tell). The $CONTENT_ENCODING variable
   tells  us  how the attachment is encoded in the message - a likely bet
   is "base64" for most binary file types.

   Another,  less  reliable,  logic is to look at the file name passed to
   the  script  as $CONTENT_FILE_NAME. MSword documents normally have the
   extension ".doc", so we could check for this too.

   All  of  these  factors  give  us  clues about the type of file we are
   dealing with.

   Once  the  file  type  is known, the filter script can either pass the
   attachment  through  "as  is",  modify  it  slightly,  or  replace  it
   altogether with something else.

   A  much  fuller  script  than  the  one  shown in the example above is
   included  with  the  protector  suite,  it is called "part_filter" and
   handles a growing number of file types.

A little history.

   The  reason  I  chose  to  make  the filter a shell script is really a
   historical  one.  My  first  attempt  at  this  program  used  a crude
   configuration  language,  and  had all descision the logic embedded in
   the  protector  program  itself.  I  found  that  I  had  to  make the
   configuration  language  more  and  more  "general" (and complex) as I
   considered a larger number of file types. One day it dawned on me that
   I  was re-inventing a perfectly good wheel, and that BASH provided all
   the logical constructs I needed, and wasnt "yet another" language. So,
   at a little performace cost, I dropped the configuration language, and
   re-achitected things to use the generic filtering logic you have here.

   I  am now wondering whether I was right in this; if this program is to
   be  useful  to non programmers, then we dont really want a programming
   language  taking the place of a configuration file - the potential for
   errors  is  too  large.  I  am beginning to think about how the filter
   could  be  configured  in a more user-friendly way - but this tends to
   lead us back to where I started. The question is still open . . .
