

                              PERL


                    a language by Larry Wall


            Practical Extraction and Report Language


                               or


             Pathologically Eclectic Rubbish Lister


                        Tom Christiansen
                   CONVEX Computer Corporation


                ---------------------------------------

                            Overview


   o What is Perl: features, where to get it, preview


   o Data Types: scalars and arrays


   o Operators


   o Flow Control


   o Regular Expressions


   o I/O: regular I/O, system functions, directory  access,  for-
     matted I/O


   o Functions and Subroutines: built-in array and  string  func-

     tions


   o Esoterica: suid scripts, debugging, packages,  command  line
     options


   o Examples


                     ---------------------------------------

                          What is Perl?


   o An interpreted language that looks a lot like C with  built-
     in  sed,  awk,  and sh, as well as bits of csh, Pascal, FOR-

     TRAN, BASIC-PLUS thrown in.


   o Highly optimized for manipulating printable text,  but  also
     able to handle binary data.


   o Especially suitable for system management tasks due  to  in-

     terfaces to most common system calls.


   o Rich enough for most general programming tasks.


   o "A shell for C programmers." [Larry Wall]

                     ---------------------------------------

                            Features


   o Easy to learn because much derives from existing tools.


   o More rapid program development because it's an interpreter


   o Faster execution than shell script equivalents.


   o More powerful than sed, awk, or sh; a2p and s2p  translators
     supplied for your old scripts.


   o Portable across many different architectures.


   o Absence of arbitrary limits like string length.


   o Fits nicely into UNIX tool and filter philosophy.


   o It's free!


                     ---------------------------------------

                         Where to get it


   o Any comp.sources.unix archive


   o Famous archive servers
        o uunet.uu.net    192.48.96.2

        o tut.cis.ohio-state.edu  128.146.8.60


   o Its author, Larry Wall <lwall@jpl-devvax.jpl.nasa.gov>
        o jpl-devvax.jpl.nasa.gov 128.149.1.143


   o Perl reference guide (in  postscript  form)  also  available

     from Ohio State, along with some sample scripts and archives
     of the perl-users mailing list.


   o USENET newsgroup comp.lang.perl good source  for  questions,

     comments, examples.

                     ---------------------------------------

                             Preview


   o It's not for nothing  that  perl  is  sometimes  called  the
     "pathologically  eclectic rubbish lister."  Before you drown

     in a deluge of features, here's a  simple  example  to  whet
     your  appetites  that demonstrates the principal features of

     the language, all of which have been present  since  version
     1.


        while (<>) {

            next if /^#/;
            ($x, $y, $z) = /(\S+)\s+(\d\d\d)\s+(foo|bar)/;

            $x =~ tr/a-z/A-Z/;
            $seen{$x}++;

            $z =~ s/foo/fear/ && $scared++;
            printf "%s %08x %-10s\n", $z, $y, $x

                if $seen{$x} > $y;
        }


                     ---------------------------------------

                           Data Types


   o Basic data types are scalars, indexed arrays of scalars, and
     associative arrays of scalars.


   o Scalars themselves are either string, numeric,  or  boolean,

     depending  on  context.   Values  of  0  (zero) and '' (null
     string) are false; all else is true.


   o Type of variable determined by leading special character.

        o $       scalar
        o @       indexed array (lists)

        o %       associative array
        o &       function


   o All data types have their own separate namespaces, as do la-

     bels, functions, and file and directory handles.

                     ---------------------------------------

                      Data Types (scalars)


   o Use a $ to indicate a scalar value


        $foo = 3.14159;


        $foo = 'red';


        $foo = "was $foo before";       # interpolate variable


        $host = `hostname`;     # note backticks


        ($foo, $bar, $glarch) = ('red', 'blue', 'green');


        ($foo, $bar) = ($bar, $foo); # exchange

                     ---------------------------------------

                    Special Scalar Variables


   o Special scalars are named with punctuation (except $0).  Ex-
     amples are


        o $0      name of the currently executing script

        o $_      default for pattern operators and implicit I/O
        o $$      the current pid

        o $!      the current system error message from errno
        o $?      status of last `backtick`, pipe, or system

        o $|      whether output is buffered
        o $.      the current line number of last input

        o $[      array base, 0 by default; awk uses 1
        o $<      the real uid of the process

        o $(      the real gid of the process
        o $>      the effective uid of the process

        o $)      the effective gid of the process

                          ---------------------------------------

                       Data types (arrays)


   o Indexed arrays (lists); $ for one scalar element, @ for all
        $foo[$i+2] = 3; # set one element to 3

        @foo = ( 1, 3, 5 );# init whole array
        @foo = ( ) ;    # initialize empty array

        @foo = @bar;    # copy whole @array
        @foo = @bar[$i..$i+5];  # copy slice of @array


   o $#ARRAY is index of highest subscript, so the script's  name

     is   $0   and   its  arguments  run  from  $ARGV[0]  through
     $ARGV[$#ARGV], inclusive.


   o Associative (hashed) arrays; $ for one scalar element, % for

     all
        $frogs{'green'} += 23;  # 23 more green frogs

        $location{$x, $y, $z} = 'troll'; # multi-dim array
        %foo = %bar;            # copy whole %array

        @frogs{'green', 'blue', 'yellow'} = (3, 6, 9);

                     ---------------------------------------

                     Special Array Variables


   o @ARGV   command line arguments


   o @INC    search path for files called with do


   o @_      default for split and subroutine parameters


   o %ENV    the current enviroment; e.g. $ENV{'HOME'}


   o %SIG    used to set signal handlers


        sub trapped {
            print STDERR "Interrupted\007\n";

            exit 1;
        }

        $SIG{'INT'} = 'trapped';

                     ---------------------------------------

                            Operators


Perl uses all of C's operators except for type  casting  and  `&'
and `*' as address operators, plus these


   o exponentiation:  **, **=


   o range operator: ..

        $inheader = 1 if /^From / .. /^$/;
        if (1..10) { do foo(); }

        for $i (60..75) { do foo($i); }
        @new = @old[30..50];


   o string concatenation: ., .=


        $x = $y . &frob(@list) . $z;

        $x .= "\n";

                     ---------------------------------------

                      Operators (continued)


   o string repetition: x, x=


        $bar = '-' x 72; # row of 72 dashes


   o string tests: eq, ne, lt, gt, le, ge


        if ($x eq 'foo') { }
        if ($x ge 'red' ) { }


   o file test operators like augmented /bin/test tests  work  on

     strings or filehandles


        if (-e $file)  { } # file exists
        if (-z $file) { } # zero length

        if (-O LOG) { }    # LOG owned by real uid
        die "$file not a text file" unless -T $file;


                     ---------------------------------------

                          Flow Control


   o Unlike C, blocks always require enclosing braces {}


   o unless and until are just if and while negated


        o if (EXPR) BLOCK else BLOCK
        o if (EXPR) BLOCK elsif (EXPR) BLOCK else BLOCK

        o while (EXPR) BLOCK
        o do BLOCK while EXPR

        o for (EXPR; EXPR; EXPR) BLOCK
        o foreach $VAR (LIST) BLOCK


   o For readability, if, unless, while, and until may be used as

     trailing statement modifiers as in BASIC-PLUS


        return -1 unless $x > 0;

                     ---------------------------------------

                    Flow Control (continued)


   o Use next and last rather than C's continue and break


   o redo restarts the current iteration, ignoring the loop test


   o Blocks (and next, last, and redo) take optional  labels  for
     clearer loop control, avoiding the use of goto to exit nest-

     ed loops.


   o No switch statement, but it's easy to roll your own


   o do takes 3 forms
        o execute a block

          do { $x += $a[$i++] } until $i > $j;
        o execute a subroutine

          do foo($x, $y);
        o execute a file in current context

          do 'subroutines.pl';

                          ---------------------------------------

                       Regular Expressions


   o Understands egrep regexps, plus


        o \w, \W  alphanumerics plus _ (and negation)
        o \d, \D  digits (and negation)

        o \s, \S  white space (and negation)
        o \b, \B  word boundaries (and negation)


   o C-style escapes recognized, like \t, \n, \034


   o Don't  escape  these  characters  for  their  special  mean-

     ing:  ( ) | { } +


   o Character classes may contain metas, e.g. [\w.$]


   o Special variables: $& means all text matched, $` is text be-
     fore match, $' is text after match.


                     ---------------------------------------

                 Regular Expressions (continued)


   o Use \1 .. \9 within rexprs; $1 .. $9 outside


        if (/^this (red|blue|green) (bat|ball) is \1/)
            { ($color, $object) = ($1, $2); }

        ($color, $object) =
            /^this (red|blue|green) (bat|ball) is \1/;


   o Substitute and translation operators are like sed's s and y.

        s/alpha/beta/;
        s/(.)\1/$1/g;

        y/A-Z/a-z/;


   o Use =~ and !~ to match against variables


        if ($foo !~ /^\w+$/) { exit 1; }
        $foo =~ s/\btexas\b/TX/i;


                     ---------------------------------------

                               I/O


   o Filehandles have their own distinct namespaces, but are typ-
     ically  all upper case for clarity.  Pre-defined filehandles

     are STDIN, STDOUT, STDERR.


   o Mentioning a filehandle in angle brackets reads next line in
     scalar  context, all lines in an array context; newlines are

     left intact.


        $line = <TEMP>;
        @lines = <TEMP>;


   o <> means all files supplied on command  line  (or  STDIN  if

     none). When used this way, $ARGV is the current filename.


   o When used in a while construct, input lines are automatical-
     ly assigned to the $_ variable.


                     ---------------------------------------

                         I/O (continued)


   o Usually iterate over file a line at a time, assigning to  $_
     each time and using that as the default operand.


        while ( <> ) {

            next if /^#/;       # skip comments
            s/left/right/g;     # global substitute

            print;              # print $_
        }


   o If not using the pseudo-file <>, open a filehandle:


        open (PWD,      "/etc/passwd");

        open (TMP,      ">/tmp/foobar.$$");
        open (LOG,      ">>logfile");

        open (TOPIPE,   "| lpr");
        open (FROMPIPE, "/usr/etc/netstat -a |");


                     ---------------------------------------

                         I/O (continued)


   o May also use getc for character I/O and read for raw I/O


   o Access to eof, seek, close, flock, ioctl, fcntl, and  select
     calls for use with filehandles.


   o Access to mkdir, rmdir, chmod, chown, link, symlink (if sup-

     ported), stat, rename, unlink calls for use with filenames.


   o Pass printf a filehandle as its first argument unless print-
     ing to STDOUT


        printf LOG "%-8s %s: weird bits: %08x\n",

            $program, &ctime, $bits;


   o Associative arrays may be bound to dbm files with dbmopen()

                     ---------------------------------------

                        System Functions


A plethora of functions  from  the  C  library  are  provided  as
built-ins, including most system calls.  These include


   o chdir, chroot, exec, exit, fork, getlogin, getpgrp, getppid,

     kill,  setpgrp,  setpriority, sleep, syscall, system, times,
     umask, wait.


   o If your system has Berkeley-style networking, bind, connect,

     send,  getsockname,  getsockopt,  getpeername, recv, listen,
     socket, socketpair.


   o getpw*, getgr*, gethost*, getnet*, getserv*, and getproto*.


   o pack and unpack can be used for manipulating binary data.


                  ---------------------------------------

                        Directory Access


Three methods of accessing directories are provided.


   o You may open a pipe from /bin/ls like this:
        open(FILES,"/bin/ls *.c |");

        while ($file = <FILES>) { chop($file); ... }


   o The directory-reading routines are provided as built-ins and
     operate  on directory handles.  Supported routines are open-

     dir, readdir, closedir, seekdir, telldir, and rewinddir.


   o The easiest way is to use perl's file globbing notation.   A
     string  enclosed  in  angle  brackets containing shell meta-

     characters evaluates to a list of matching filenames.


        foreach $x ( <*.[ch]> ) { rename($x, "$x.old"); }
        chmod 0644, <*.c>;


                     ---------------------------------------

                           Subroutines


   o Subroutines called either with `do' operator  or  with  `&'.
     Any  of  the  three  principal  data  types may be passed as

     parameters or used as a return value.


        do foo(1.43);


        do foo(@list)


        $x = &foo('red', 3, @others);


        @list = &foo(@olist);


        %foo = &foo($foo, @foo);

                     ---------------------------------------

                     Subroutines (continued)


   o Parameters are received by the subroutine in the special ar-
     ray @_.  If desired, these can be copied to local variables.

     This is especially useful for recursive subroutines.


        $result = &simple($alpha, $beta, @tutti);
        sub simple {

            local($x, $y, @rest) = @_;
            local($sum, %seen);

            return $sum;
        }


   o Subroutines may also be called indirectly


        $foo = 'some_routine';

        do $foo(@list)
        ($x, $y, $z) = do $foo(%maps);


                     ---------------------------------------

                          Formatted I/O


   o Besides printf, formatted I/O can be done  with  format  and
     write statements.


   o Automatic pagination and printing of headers.


   o Picture description facilitates lining up multi-line output


   o Fields in picture may be left or right-justified or centered


   o Multi-line text-block filling is  provided,  something  like

     having a %s format string with a built-in pipe to fmt)


   o These special scalar variables are useful:
        o $% for current page number,

        o $= for current page length (default 60)
        o $- for lines left on page


                          ---------------------------------------

                     Formatted I/O (example)

# a report from a bug report form; taken from perl man page
format top =
                        Bug Reports
@<<<<<<<<<<<<<<<<<<<<<<<     @|||         @>>>>>>>>>>>>>>>>>>>>>>>
$system,                      $%,         $date
------------------------------------------------------------------
.

format STDOUT =
Subject: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
         $subject
Index: @<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
       $index,                       $description
Priority: @<<<<<<<<<< Date: @<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
          $priority,        $date,   $description
From: @<<<<<<<<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
      $from,                         $description
Assigned to: @<<<<<<<<<<<<<<<<<<<<<< ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
             $programmer,            $description
~                                    ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
                                     $description
~                                    ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<
                                     $description
~                                    ^<<<<<<<<<<<<<<<<<<<<<<<...
                                     $description
.


                ---------------------------------------

                    Built-in Array Functions


   o Indexed arrays function as lists; you can add  items  to  or
     remove them from either end using these functions:

        o pop     remove last value from end of array
        o push    add values to end of array

        o shift   remove first value from front of array
        o unshift add values to front of array


For example


     push(@list, $bar);

     push(@list, @rest);
     $tos = pop(@list);

     while ( $arg = shift(@ARGV) )  { }
     unshift( @ARGV, 'zeroth arg', 'first arg');


                  ---------------------------------------

            Built-in Array Functions (split and join)


   o split breaks up a string into an array of new strings.   You
     can split on arbitrary regular expressions, limit the number

     of fields you split into, and save  the  delimiters  if  you
     want.


        @list = split(/[, \t]+/, $expr);

        while (<PASSWD>) {
            ($login, $passwd, $uid, $gid, $gcos,

                $home, $shell) = split(/:/);
        }


   o The inverse of split is join.


        $line = join(':', $login, $passwd, $uid,

                          $gid, $gcos, $home, $shell);

                     ---------------------------------------

         Built-in Array Functions (sort, grep, reverse)


   o reverse inverts a list.


        foreach $tick (reverse 0..10) { }


   o sort returns a new array with the elements ordered according
     to  their  ASCII values.  Use your own routine for different

     collating.


        print sort @list;
        sub numerically { $a - $b; }

        print sort numerically @list;


   o grep returns a new list consisting of all the  elements  for
     which  a  given  expression is true.  For example, this will

     delete all lines with leading pound signs:


        @lines = grep(!/^#/, @lines);

                     ---------------------------------------

               Built-in Array Functions (%arrays)


For manipulating associative arrays, the keys  and  values  func-
tions  return  indexed  arrays  of  the  indices  and data values

respectively.  each is used to iterate through an associative ar-
ray to retrieve one ($key,$value) pair at a time.


   while (($key,$value) = each %array) {

       printf "%s is %s\n", $key, $value;
   }


   foreach $key (keys %array) {

       printf "%s is %s\n", $key, $array{$key};
   }


   print reverse sort values %array;


                ---------------------------------------

                        String functions


   o Besides the powerful regular  expression  features,  several
     well-known C string manipulation functions are provided, in-

     cluding crypt, index, rindex, length, substr, and sprintf.


   o The chop function efficiently  removes  the  last  character
     from  a  string.   It's  usually used to delete the trailing

     newline on input lines. Like many perl operators,  it  works
     on $_ if no operand is given.


        chop($line);

        chop ($host = `hostname`);
        while (<STDIN>) {

            chop; ...
        }


                     ---------------------------------------

                  String functions (continued)


   o The eval operator lets  you  execute  dynamically  generated
     code.  For example, to process any command line arguments of

     the form variable=value, place  this  at  the  top  of  your
     script:


        eval '$'.$1."'$2';"

            while $ARGV[0] =~ /^([A-Za-z_]+=)(.*)/ && shift;


     The eval operator is also useful  for  run-time  testing  of
     system-dependent  features which would otherwise trigger fa-

     tal errors.  For example, not all systems support  the  sym-
     link  or dbmopen; you could test for their existence by exe-

     cuting the statements within an eval and testing the special
     variable  $@,  which contains the text of the run-time error

     message if anything went wrong.

                     ---------------------------------------

                          Suid Scripts


   o Perl programs can be made to run setuid, and can actually be
     more secure than the corresponding C program.


   o Because interpreters have no  guarantee  that  the  filename

     they  get  as  the  first argument is the same file that was
     exec'ed, perl won't let your run a setuid script on a system

     where setuid scripts are not disabled.


   o Using a dataflow tracing mechanism triggered by setuid  exe-
     cution, perl can tell what data is safe to use and what data

     comes from an external source and thus is "tainted."


   o Tainted data may not be used directly or indirectly  in  any
     command  that  modifies  files,  directories or processes or

     else a fatal run-time error will result.

                     ---------------------------------------

                     Debugging and Packages


   o When invoked with the -d  switch,  perl  runs  your  program
     under a symbolic debugger (written in perl) somewhat similar

     to sdb in syntax.  Amongst other things, breakpoints may  be
     set,  variables  examined  or  changed,  and call tracebacks

     printed out.  Because it uses eval on your code, you can ex-
     ecute any arbitrary perl code you want from the debugger.


   o Using  packages  you  can  write   modules   with   separate

     namespaces  to  avoid  naming conflicts in library routines.
     The debugger uses this to keep its variables  separate  from

     yours.   Variable are accessed by the package'name notation,
     as in this line from the debugger:


        $DB'stop[$DB'line] =~ s/;9$//;


                     ---------------------------------------

                      Command Line Options


The following are the more important command line switches recog-
nized by perl:


   o -v      print out version string

   o -w      issue warnings about error-prone constructs
   o -d      run script under the debugger

   o -e      like sed: used to enter single command lines
   o -n      loop around input like sed -n

   o -p      as with -n but print out each line
   o -i      edit files in place

   o -a      turn on autosplit mode (like awk) into @F array
   o -P      call C pre-processor on script


                     ---------------------------------------

                     Examples: Command Line


   # output current version
   perl -v


   # simplest perl program

   perl -e 'print "hello, world.\n";'


   # useful at end of "find foo -print"
   perl -n -e 'chop;unlink;'


   # add first and last columns (filter)

   perl -a -n -e 'print $F[0] + $F[$#F], "\n";'


   # in-place edit of *.c files changing all foo to bar
   perl -p -i -e 's/\bfoo\b/bar/g;' *.c


   # run a script under the debugger

   perl -d myscript

