Path: usenet.cise.ufl.edu!newsfeeds.nerdc.ufl.edu!news.magicnet.net!news.maxwell.syr.edu!netnews.com!news-b.ais.net!ais.net!uunet!in1.uu.net!news.neta.com!not-for-mail
From: Martin Schwartz <schwartz@cs.tu-berlin.de>
Newsgroups: comp.lang.perl.announce,comp.lang.perl.modules
Subject: Announce: Convert::Context
Followup-To: comp.lang.perl.modules
Date: 6 Oct 1998 15:21:32 GMT
Organization: Technical University of Berlin, Germany
Lines: 245
Approved: merlyn@stonehenge.com (comp.lang.perl.announce)
Message-ID: <6vdchs$bl0$1@news.neta.com>
NNTP-Posting-Host: gadget.cscaper.com
X-Trace: news.neta.com 907687292 11936 206.67.186.3 (6 Oct 1998 15:21:32 GMT)
X-Complaints-To: abuse@neta.com
NNTP-Posting-Date: 6 Oct 1998 15:21:32 GMT
X-Disclaimer: The "Approved" header verifies header information for article transmission and does not imply approval of content.
Xref: usenet.cise.ufl.edu comp.lang.perl.announce:153 comp.lang.perl.modules:4643

Hi,

As part of a larger work I wrote a module dealing with Attributed Strings.
It adapts perl's string commands, so the usage should flow easily into 
your mind. I called it Convert::Context. Convert, because I assumed it to
be the typical environment where this module will be used. Context, because
it deals with connotated Texts.

Ok, the name is negotiable. ;)

The module might be interesting for all persons dealing with texts having 
format tags or other stuff mixed among. Think of HTML, XML, WordPerfect. 
Current state might be characterized as "working but not optimized".

So here comes Convert::Context. An excerpt of the man page follows below.

The distribution file is called:

    Convert-Context-0.500.tar.gz

I just put it into my CPAN directory. You can get it also directly from:

    http://wwwwbs.cs.tu-berlin.de/~schwartz/perl/

Have fun,

Martin


=== schnipp

NAME
       Convert::Context - an Attributed Text data type

       - ALPHA - release

SYNOPSIS
       See below.

DESCRIPTION
       Convert::Context maintains attributed strings. It allows
       you to access those strings similar to perl's normal
       strings.

       An attributed string is a string to that attributes are
       connected at certain string positions. An attribute can be
       everything scalar: numbers, strings, references are
       welcome. Attributes are not part of the string. Semantics
       of the attributes have to be done by the applying code.

       What does this mean?

       A basic work for a text system is to localize a certain
       text part. This is trivial if you have only plain text to
       look at. It is no longer trivial, if you have attributes
       or entries among your text like: bold, italic, bookmarks
       and so on. One has two strategies to mingle attributes
       with a string:

       1.  You can enrich the text by inserting control codes.
           E.g., if you have a line with two bold words:

           (A) "The word bold is always bold"

           it would look (here with HTML controls) like:

           (B) "The word <b>bold</b> is always <b>bold</b>"

           If you would look for the text "bold is" in (B) with
           perls m// operator, you'd fail. You would have to
           strip the HTML control sequences first. This is an ok
           method, but not used here.

       2.  You can maintain separate lists, holding at which
           position of the text which control codes are stored.
           This is, what Convert::Context does.  The example from
           above would look like:

              offset    0---------1---------2-------
              text      The word bold is always bold
              attrib   (0        1   0          1   )

    [...]

       new 
       
           $Ct = Convert::Context -> new (
               [$cs]
           )

           $Ct = Convert::Context -> new (
               [$cs,] \$txt [,[@a], [@o]]
           )

           $Ct = Convert::Context -> new (
               [$cs,] [\$txt [,[@a], [@o]]], [...], ...
           )

           Returns a new Context string. It can be initialized
           three ways: (1) Without parameters, (2) with a
           reference to a text string, an attrib list reference
           and an offset list reference, or (3) with a list of
           references of (2).

           Optionally it can be initialized with a leading
           parameter $cs. This stands for "character length" and
           specifies the byte size of one character.  One needs
           this when using e.g. UTF16 (Unicode) characters.

           Example:

            (1)
              $Empty = Convert::Context -> new;

            (2)
              $Plain = Convert::Context -> new (\("Plain text\n"));
              $Bold  = Convert::Context -> new (\("Attribute 1 text"), [1]);

            (3)
              Special (but useful) case:
              $Mixed = Convert::Context -> new (
                 [\("This is an "),                         [0] ],
                 [\("all bold"),                          [122] ],
                 [\(", short and sometimes ")                   ],
                 [\("italic"),       ["Strange text attribute"] ],
                 [\(" text."                                    ]
              ;

           Attribute 0 and Offset 0 is used as default value, if
           none is explicitly given. The meaning of all
           attributes (here 0, 122 and "Strange text attribute")
           has to be defined 100% by the applying code.  In this
           example one would assume, that a text processor was
           connoting the attributes 0, 122 and "Strange text
           attribute" to the semantics: plain, bold and italic.

       replace
           $n = $Ct -> replace ($pattern, $replace, egimosx)

           Replaces one or all occurrances matching to $pattern
           with $replace.  Returns the number of replacements, or
           false if pattern is not found.  Implemented mainly via
           perls replace operator:

              s/$pattern/$replace/egimosx

           $replace here can be a string, a Context or a code
           reference. In the latter case this routine will be
           called at each match, passing the matched string as
           parameter. The matched text will then be replaced with
           the return value of the routine.

           $n = $Ct -> replace ([@pattern], [@replace], egimosx)

           You can call replace with list references holding
           corresponding sets of patterns and replacements.
           pattern and replace can be strings or Contexts, and
           replace additionally code references. The patterns
           will be glued together to a single pattern match,
           using pattern match or operator |.

           Examples:

              (1) $Ct -> replace ("krims", "kram", "g")

           Option g says, that not only one, but all occurrances
           of string "krims" shall be substituted by string
           "kram".  "kram" will get the attributes of "krims"
           (see method "substr"). If you want to have more
           control about the attributes of "kram", you can pass
           the replacement string as a Context.

              (2) $Ct -> replace ("krims", $Ct, "g")

           Replaces all occurrances of string "krims" with the
           Context $Ct. This is useful, if you want to have $Ct
           special attributes.

              (3) $Ct -> replace (" asta tu ", " AStA TU ", "ig")

           Option i says, that the characters case shall be
           ignored. So example (3) would replace " asta tu ", "
           ASTA TU ", " Asta Tu " ... with " AStA TU ".  (AStA
           stands for Allgemeiner Studierendenausschuss. Students
           governments are called like this in Germany and quite 
           cool).

              (4) $Ct -> replace ("\02", \&footnote, "g")

           This would call a function "footnote". The function
           will be called with three parameters:

              &function($match, $Ct, $pos)

              1. The matched string (here "\02")
              2. The Context        (here $Ct)
              3. The match position

              (5) $Ct -> replace ("krims", sub {allow (@_, "kram")}, "ig")

           This notation would call a function "allow" for each
           match, quite like (4). But further more here the
           string "kram" would be passed as additional parameter.

              (6) $Ct -> replace (["a", "o"], ["o", "a"], "g")

           Substitutes a's with o's and o's with a's.

    [...]

       substr

           $Ct2 = $Ct1 -> substr ($o1, $l1)

           Returns a partial Context of Ct1 as new Context Ct2.
           Ct2 will be copied from Ct1 starting at position o1
           and with the length l1.

           $Ct  = $Ct  -> substr ($o1, $l1, $str [,$o2, $l2])

           If a string is given as argument, the partial Context
           starting at offset o1 with length l1 is substituted by
           string. String gets the attributes of the partial
           Context. If e.g. the string to be replaced would be
           "<0>di<1>n<2>g<0>s", after the replacement it might
           look like "<0>bu<1>m<2>s".

           $Ct1 = $Ct1 -> substr ($o1, $l1, $Ct2 [,$o2, $l2])

           The partial Context of Ct1 starting at offset o1 with
           length l2 is substituted by Context Ct2.

           If o<n> is undef, o<n> is set to 0.

           If l<n> is undef, l<n> is set according to end of
           Ct<n>

    [...]

=== schnapp 

-- 
// Le degre zero de l'ecriture? Zero probleme!