Getarc Manual

Contents:

  • 0.0: Introduction
  • 1.0: Invocation
  • 2.0: getarc.rc
  • 3.0: getarc.cfg
  • 4.0: Scripts

Introduction

Getarc is a monolithic perl program for gathering data from multiple instruments and combining them into a single archive with a common time base. The original data must be stored in a format consistent with the Format Specification for Data Exchange, created by Steven Gaines and Stephen Hipskind of NASA Ames Research Center. The output is generated as a MATLAB .mat file.

Getarc was designed with the following goals in mind:

The first goal is key, because one of the times when getarc is most useful is during a mission when no one has time to futz with keeping an archive up-to-date. Getarc can check for updates autonomously several times a day, download exactly those files which have changed, update the archive and mail out notification to a mailing list identifying which files have been updated.

The second goal is related to the first, since without it, any such benign change would require significant intervention. The fact is, these sorts of changes do occur on a regular basis, but the file format contains sufficient documentation to allow a well-designed utility to compensate for them.

1.0: Invocation

getarc [date ...]

The date arguments are optional. The default is to process all dates present in the FTP or localhost archive. Date arguments may include wildcards.

getarc reads the files getarc.rc and getarc.cfg in the current directory to determine how to proceed.

2.0: getarc.rc

Server
The Server statement specifies where the archived exchange files for the individual instruments are to be found. host is the hostname of an FTP server or 'localhost' if the original archives are stored on the local filesystem. dir indicates which directory on the server or localhost contains the archives. TZ is the timezone string indicating how times are reported when an FTP 'dir' is performed or a local 'ls -l'.
Project
The Project statement defines the username and password to be used when accessing FTP hosts. (I used 'Project' rather than 'user' because the NASA archives tend to be stored by project rather than usernames.)
FileExt Ext
Ext is the file extension for the source archives.
Subject Text
Text is used on the subject line of e-mail announcing results of a getarc run.
Notify list
List of receipients for e-mail notification of updates to the archive.
VNotify list
List of receipients for e-mail notification of any errors during a run.
MatURL URL
URL where archive summary information generated by getarcsum can be found. This URL will be included in e-mail to the Notify list and should be specified in conjunction with the Summarize statement.
MatDir directory
The specified directory is used for storing the final Matlab archives. This is useful for keeping the main directory clean and/or placing the final archives within the domain of a web server.
LogDir directory
The specified directory is used for storing persistent log information including the log files and the main .sps files. This is useful for keeping the main directory clean and/or offloading some volume to another disk. The contents of this directory can be regenerated from the source files, so this could be located on a volume that is not backed up.
Summarize dir
Indicates that getarcsum should be run with the specified argument.
AcceptIf expr
Defines a perl expression to be evaluated in order to decide whether or not to generate an archive for a specific date. An example is: AcceptIf $Files{"O3$Date"}
Delta value
Value is used as the delta when using SNAFU to merge data from the source files into the final archive. Bins in the final archive are centered on the reported time and include the average of points +/- delta seconds from that time. Hence this is roughly equivalent to specifying a bin size of 2*delta.
Translate script
Specifies a script to be evaluated to translate a source file into the standard exchange file format. This is necessary, for example, when processing data from missions predating the format specification. The specified script is actually charged with defining a subroutine called 'Translate' which will be invoked with a single argument, $Filename, which is the filename to be translated. This function should then read from getarc::srcdir$Filename and write the translated result to getarc::cache$Filename.
Compress
If specified, indicates that source files, spreadsheets and log files should be compressed after use.
Verbose level
Verbosity levels are bitmapped:
Debug level
Debug levels are also bitmapped:

3.0: getarc.cfg

getarc.cfg is the file in which the individual instruments and variables are defined. It consists of one or more instrument definitions. Each instrument definition consists of a header line followed by zero or more data definition or SCRIPT lines. Comment lines beginning with '#' can appear at any point in the file.

An instrument header line begins with two non-space characters at the left margin optionally followed by a space and a descriptive title for the instrument. e.g.:

    TW Harvard Total Water

A data definition line begins with one or more spaces, then a variable name, a colon (':'), and then a pattern that should match the variable's description in the source file. The pattern is matched using perl regular expressions. This means that if you want to match special characters such as parentheses, you need to escape them with a backslash: e.g.:

      TW: Total Water \(ppmv\)

If the experimenter tends to forget case, you can use character classes to match all the likely alternates:

      TW: [Tt]otal [Ww]ater \(ppmv\)

[Note that these data definition lines all start with spaces!]

There are many more features of perl regular expressions. Ask your local perl guru for more information.

4.0: Scripts

A script line is listed along with the data definitions for an instrument. Like a data definition, it begins with spaces, then the keyword 'SCRIPT'. The remainder of the line is treated as perl code that should be evaluated whenever this instrument's source files need to be processed.

      SCRIPT { require "PT.pl"; Process_PT( $Date, $InstDef{$Inst}, \%ArcCol, $Delta, $Instday, $Filename ); }

Scripts are given total responsibility for the processing of a given instrument's source file. That is to say, when you specify a script, getarc will no longer attempt to locate your variables in the source file or merge data from the source file into the archive. This is because some scripts may need to make changes before merging while others may need to make changes after.

Scripts can make use of several handy getarc variables and subroutines, but not all of the getarc variables a script might want access to are global. You must make up for that by making sure that your SCRIPT line passes all the variables you need.

Writing getarc scripts requires considerable knowledge of the getarc internals, none of which are well-documented. Your best bet is to talk to your local getarc guru to see the best way to begin and refer to examples of scripts currently in use.

Some of the getarc variables and functions which will be useful within a script:

$getarc::cache$Filename
Source exchange file (possibly after Translation)
$getarc::LogDir$Date.sps
Output spreadsheet
MatchCols()
Matches variable definitions from getarc.cfg with the headers in the source file.
LogMsg()
Writes status to the verbose log. Calls should be sensitive to $getarc::verbose.
conexec()
Invoke a specified command on a console. This is required for SNAFU and other SNAFU-like utilities that must run on a console.
Merge()
Handy routine to invoke SNAFU merge.

Return to Manuals Guide. Written by Norton T. Allen


Return to Manuals Guide


last updated: Mon Apr 14 12:37:29 2003 webmaster@huarp.harvard.edu
Copyright 2003 by the President and Fellows of Harvard College