pdr Configuration Index pdr Interactive Mode

pdr Reference

At the moment there are the following types of data sources available:

these three data sources work with expressions
these data sources work with
specific data formats in files

Input per command line

The simplest (and most uncomfortablest) way to get data into the system is the pdr command line, this means the invocation of pdr. There's nothing needed to be configured for this.

pdr has the command line option -e (--expression) which allows to specify an expression. This option can be multiply used. Moreover all characters behind pdr that are not part or argument of a command line option are summed up to one big expression and processed at once (see there).

If an expression on the command line doesn't have a timestamp the current date and time will be used.

If there's a failure during processing because of any incorrectness in an expression pdr produces a message. A data transfer into the rejections doesn't take place.

Input per mail (POP3 and IMAP)

For the use of e-mail mailboxes we assume that data (mails) have been arrived in the mailbox and that they are not processed by any other application. These mails must have the following properties:
  1. a unique subject
  2. an exploitable timestamp (normally the SMTP server adds one during sending)
  3. plain, continuous ASCII text format (no HTML, RTF ...)
  4. text completely in expressions
If there's an e-mail data source configured the mail server will be requested during the next invocation. pdr looks if there are mails on the server, checks their subject and processes matching e-mails one by one, line by line, each line is an expression. If a line has a timestamp this one has priority. Otherwise the timestamp of the e-mail is valid implicitly. This is very handy because normally you will never have to enter a timestamp manually in usual, single line e-mails.

Here's a complete e-mail source:

From: superhero <Mymail@gmx.net>
To: MyMail@gmx.net
Subject: Q
Date: Thu, 04 Feb 2010 17:56:11 +0100
Message-ID: <87pr4ley8k.fsf@castor.ch>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii

5.3 8i

Normally most of the values in the header lines are taken from default values. Date and Message-ID are added by the server, MIME-Version and Content-Type come from the e-mail client application. The only remaining text parts that have really to be entered are the subject (that's why it should be short, the single letter Q here) and the contents of the message, the data line.

On POP3 servers processed e-mails are deleted from the server regardless of the success. So they never get processed a second time. This deletion can be suppressed by configuration. On IMAP servers the user can configure if the mails should be deleted or marked as read. In this case the mails remain on the server and can still be archived.

If there's a failure during processing because of any incorrectness in an expression pdr transfers these expressions into the rejections and writes out a message.

Input per text file

If we use a text file for data input every line counts as expression. This method is practical if you get data in a period without any opportunity to transmit them online. So you have to collect them in a file manually, expression by expression.

Lines starting with # are not processed.

If there's a failure during processing because of any incorrectness in an expression pdr produces a message. A data transfer into the rejections doesn't take place.

Text files that are processed successfully are deleted if they are configured. So they are not processed a second time. This deletion can be suppressed during configuration.

Input per CSV file

The abbreviation CSV means "comma separated values". Instead of the comma pdr also accepts the semicolon and the tabulator as separator between the values.

There are two different ways to tell pdr what comma separated data value should get into which collection:
  1. a control line in the CSV file preceding the data lines
  2. a control line in the configuration file, valid for the entire CSV file
In the first case a pdr CSV file would have the following structure:

control line
data line1
[...]
data lineN

control line
data line1
[...]
data lineN

[...]

This kind of use of control lines is unusually but gives us the wanted flexibility and openness. Normally you can insert them easily by hand or by a program like sed. In the second case the CSV file would contain only data lines as expected.

A control line has the following structure:

[# pdr] datetime [separator collection]+

Example:

# pdr datetime, *, n, l; h; q»p, #            (» means a tabulator)

This is a control line for data lines with a timestamp and seven values for the collections *, n, l, h, q, p and #.

Each control line in a CSV file will be known on it's prefix # pdr, a control line in a configuration file doesn't need this prefix. The following keyword datetime marks the position of the timestamp on the data lines. It doesn't have to be on the beginning but every line must have one - there are no data values without a timestamp. In the example we can see that we can have several separators on one data line. Data lines according to this control line whould look like this:

2008-10-11 12:31:38, 5.2, 7, 8; 42.3; 12»96, first measuring
2008-10-12 12:48:08, 6.1,  , 8; 53.1; 16
»93,
2008-10-13 12:43:57, 5.8, 7, 7; 34.2; 15
»94, third measuring

The second line has no values for the collections n and #. In the case of missing values just no inserts are made.

If you have CSV files containing more values than you want to import into collections you can declare omissions in the control line:

# pdr datetime, a, b, , , , c, d, e

Here we read a timestamp and two collections, then we omit three values on the data lines and read again three values.

Lines starting with # are not processed.

During the processing of a CSV file the whole file is handled in a single transaction. If there's a failure because for instance a data value on a line doesn't match the type of the declared collection the whole file is dismissed. A data transfer into the rejections doesn't take place.

CSV files that are processed successfully are deleted if they are configured. So they are not processed a second time. This deletion can be suppressed during configuration.

Input per XML file

pdr can read XML files for data input. These files are well formed, read- and editable, and are the ideal thing for data exchange between different software systems. pdr defines an own, intentional very simple format. But the responsible part of the program is designed to be extended for further XML formats.

The pdr XML format

The pdr XML format is completely documented in the file pdr.xsd:

<?xml version="1.0" encoding="iso-8859-1" ?>

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" >

  <xsd:annotation>
    <xsd:documentation xml:lang="en">
     pdr XML input file definition (C) T.M. Bremgarten 2010-01-31
    </xsd:documentation>
  </xsd:annotation>

  <xsd:element name="pdr">
    <xsd:complexType>
      <xsd:sequence>
        <xsd:element name="collection" type="collection" minOccurs="0" maxOccurs="unbounded" />
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>

  <xsd:complexType name="collection">
    <xsd:sequence>
      <xsd:element name="item" minOccurs="0" maxOccurs="unbounded">
        <xsd:complexType>
           <xsd:attribute name="datetime" type="xs:string" />
           <xsd:attribute name="value" type="xs:string" />
        </xsd:complexType>
      </xsd:element>
    </xsd:sequence>
    <xsd:attribute name="name" type="xs:string" use="required" />
    <xsd:attribute name="type" type="collection_type" use="required" />
    <xsd:attribute name="purpose" type="xs:string" />
  </xsd:complexType>

</xsd:schema>

This definition allows files that look like this:

<?xml version="1.0" encoding="ISO-8859-1"?>
<pdr>
    <collection name="#" type="text">
        <item datetime="2001-07-09 18:27:11" value="first measuring"/>
        <item date
time="2001-07-10 07:52:01" value="second measuring"/>
        <item date
time="2001-07-10 10:07:00" value="third measuring"/>
        [...]
    </collection>
    <collection name="*" type="numeric">
        <item date
time="2001-07-12 13:57:01" value="9.3"/>
        <item date
time="2001-07-12 14:46:45" value="5.6"/>
        <item date
time="2001-07-12 18:25:36" value="5.7"/>
        [...]
    </collection>
    <collection name="l" type="numeric">
        <item date
time="2001-07-03 21:41:58" value="7"/>
        <item date
time="2001-07-04 21:48:43" value="8"/>
        <item date
time="2001-07-05 21:50:49" value="7"/>
        [...]
    </collection>
</pdr>

This format is self explaining. The data of the collections are specified directly and well readable.

During the processing of a XML file the whole file is handled in a single transaction. If there's a failure because for instance a data value doesn't match the type of a collection the whole file is dismissed. A data transfer into the rejections doesn't take place.

XML files that are processed successfully are deleted if they are configured. So they are not processed a second time. This deletion can be suppressed during configuration.

(more XML formats)

...


pdr Configuration Index pdr Interactive Mode