ARP Data Acquisition User's Guide

0: Welcome and Introduction
1: Basic Setup
2: Running an Instrument
- 2.1: Shutting Down
  - 2.1.1: Quit
  - 2.1.2: Exit: Just the GSE
- 2.2: IOMODE
3: Data Analysis
- 3.1: reduce
- 3.2: saverun
- 3.3: extract
- 3.4: Realtime Analysis
4: Software Modifications
- 4.1: Source Directory
- 4.2: make & make distribution
5: Software Components
- 5.1: Experiment.config
- 5.2: tm.dac
- 5.3: *.pcm
- 5.4: interact
- 5.5: runfile.xxxx
- 5.6: runfile.dflt
- 5.7: doit
6: System Maintainance
- 6.1: osupdate
- 6.2: fixdisk
- 6.3: settime
- 6.4: flttime
7: Flight Operations
- 7.1: Know Your System
- 7.2: Question Updates
- 7.3: Test Using Flight Configuration
- 7.4: Test New Algorithms
8: Troubleshooting

0: Welcome and Introduction

Welcome to the Atmospheric Research Project (ARP) Data Acquisition Systems. This is a large body of software designed to provide powerful, flexible, extensible and efficient data acquisition and control to a wide range of scientific instruments.

This guide is intended to outline what you need to know in order to operate your instrument. It assumes some knowledge of computer systems in general and QNX in particular. If you feel you are missing some of this background, you are invited to review the "QNX System Architecture" and "QNX User's Guide" which provide a fairly comprehensive overview. Your resident experts will also be happy to answer your questions to help you get up to speed.

For starters, you will need to have a QNX account in order to be able to login. If you don't already have one, see your system administrators to request one.

1: Basic Setup

Your instrument is operated via one or more computers which are interconnected by a computer network. If you are using more than one computer, only one is directly connected to the instrument and will be referred to as the "flight computer". (It is called this even if your instrument has never or will never fly. Maybe that's wishful thinking, maybe just stupidity, but that's what we call it!) Any other computers are used to display data and/or control the instrument and are called the "Ground Support Equipment" or GSE.

If your instrument actually does fly, the flight computer will need to be able to operate when separated from the network, so it will have its own hard disk (or other mass storage device) with its own copy of the instrument software. The GSE will also need its own hard disk and copy of the software, since it must operate independently during pre-flight operations. In some cases, a lab-based flight computer can be configured to run without its own hard disk.

In a two-computer two-disk configuration, the instrument should have a "home directory" on the GSE and another home directory on the flight computer. The home directory contains all the special software required to operate the instrument. As a rule, you will operate the instrument from the home directory on the GSE. If the home directory is /home/abc, you will type:

        cd /home/abc

2: Running an Instrument

The command to start up the instrument is doit. (On some instruments there is more than one "doit" command with slightly different names.) Entering this command will begin the process of starting up the instrument. It may take 20-30 seconds for the instrument to initialize, as there are many different programs which must be loaded. Once the initialization is complete, you should be presented with a data screen and a command line. (If not, advance to Troubleshooting.)

Your instrument may or may not begin taking data immediately. If not, you will have to issue the command "Telemetry Start". Once data acquisition has begun, you should be able to see the minor frame counter (MFCtr) counting and other channels updating.

Telemetry Start is one of a group of basic commands incorporated into every instrument. These include:

Telemetry Start: Begins data acquisition.
Telemetry Logging Suspend: If the lgr is running, this instructs it to stop writing data to disk files. If issued before Telemetry Start, no log files will be created.
Telemetry Logging Resume: If the lgr is running, this instructs it to resume writing data to disk files.
Quit: Terminates data acquisition and requests all instrument software processes to shut down.
Exit: Shuts down GSE operations without affecting data acquisition. This command may not be present on your instrument.
IOMODE n: Affects how the command line behaves. The details are described below under IOMODE

2.1: Shutting Down

Now that you have all the balls in the air, you need to know how to bring them down gracefully. While turning off the power is usually quite effective at terminating operations, it is almost certain to result in corruption of your hard disk's file system unless you have completed a proper shutdown first. There are two ending commands which are quite different and deserve special mention.

2.1.1: Quit

Quit is the total shutdown command. It not only shuts down your command and data display, but it also shuts down all the data acquisition and control programs on the flight computer and any other display or algorithm programs running elsewhere on the network. This is the command which must be issued before the flight computer can be safely powered down.

2.1.2: Exit: Just the GSE

Exit, in contrast, makes no requests at all of the flight computer, but simply shuts down local GSE operations. It is possible to shut down your GSE in order to do some other operation, and then start it up again with doit without ever terminating the data acquisition on the flight computer.

If you have more than one GSE console running, or you have an algorithm running which issues a Quit command, the flight computer's data acquisition may shutdown before you shutdown your GSE. In this case, your GSE should also shut down gracefully with the system. If it doesn't, you may need to issue an Exit command. Issuing a Quit will work also, but you will get a nasty message due to the fact that the flight command servers have already terminated.

2.2: IOMODE

The IOMODE command affects how the command line behaves. The default bahaviour is to automatically complete a word whenever the input is sufficient to do so. Other options are to never complete words, or to complete words only when the user types a space character. The value of n is bit-mapped, so it is determined by adding up the values of the desired options. The available options are:

Backspace=1: indicates that typing a backspace should backup to the last significant input character. An input character is significant if it narrows the list of possible commands.
Space=2: Indicates that typing a space or newline should be treated as a request to auto-advance.
Always=4: Words or letters are filled in whenever the following letters are unambiguous.
Word=8: Disables advances within words unless the rest of the current word is unambiguous (e.g. prevents advancing 'O' for On/Off)
Wordskip=16: Allows auto-advancement over entire words. If not specified, auto-advance will always stop at the beginning of a word.

The default value for IOMODE is 7 (Backspace, Space, Always).

3: Data Analysis

After your data acquisition is completed, post-run analysis begins. To begin, the log files are saved as a "run". This is handled directly or indirectly by the saverun utility which creates an appropriately named directory and moves the log files into it. Next, the data may need to be copied off of the flight computer onto the GSE for analysis and/or archiving. Finally, extraction programs are applied to the raw data to render the data in a form useful for subsequent analysis by the extract utility.

This entire process is automatated at a higher level by the reduce utility. When properly configured, a simple "reduce" command will perform all these functions in one step.

3.1: reduce

Reduce is the one-stop-shopping solution to post-run data analysis. It is configurable via the Experiment.config file. It will invoke saverun if you haven't already, or you may specify an existing data directory to re-run extractions and analyses. For flight operations, reduce can also perform filesystem diagnostics prior to moving the data in order to better ensure the data integrity.

3.2: saverun

Saverun is invoked by reduce to create a run directory. You may find a need to run saverun independently, but then again, you may not. Saverun's operation is also configured in Experiment.config.

3.3: extract

Extract is used to replay the instrument's data stream. It takes as arguments the name of the data directory and the name of programs which will process the data stream. These programs are commonly "extraction" programs which convert the data to useful units and save it in a useful form, such as a SNAFU spreadsheet. Like saverun, extract is run indirectly by reduce, but you may find it useful to run extract directly under certain circumstances.

3.4: Realtime Analysis

Not all data analysis needs to be done after your run is complete. It is possible to perform fairly complex calculations in real time. One of the key goals in the design of the TMC language was to make this sort of analysis possible with a minimum of effort.

In addition, I have developed a slightly higher-level processor known as CYCLE which is particularly suited to averaging data over command cycles. Cycle is capable of everything the SNAFU Solenoid Difference Calculation is and quite a bit more.

4: Software Modifications

Since this is a User's Guide, we won't go into great detail regarding programming. See the ARP Data Acquisition Experiment Developer's Guide for more information.

As a user, you will find most of the instrument programming has been taken care of, but you may find a need to modify an algorithm or a SOLDRV cycle definition or an extraction.

4.1: Source Directory

The first thing you need to know is that the source code for you instrument is not kept in the home directory, but rather in a dedicated source code directory. [Yes, you do see tmc and tma files in the home directory also, but these are copies of the real source, distributed here for archival reasons. (Alright: they are copied into the data directory by saverun so that next month you will be able to figure out what you were doing.)]

In any event, you'll need to know where the source code is kept. It might be in a subdirectory of the home directory named "src", or it might be in a subdirectory of that directory, or it might be located on a different computer.

4.2: make & make distribution

After you've made your changes, there are two more steps before they can be used. First, you must compile your changes, or translate them into a format the computer can understand. The single command that will compile all your source code is "make". If make encounters any errors along the way, you'll have to go back and fix the appropriate source code. (see Troubleshooting Compilations).

Once you've compiled all your source code, you need to distribute the results to the home directories of the GSE and flight computers. The command for this is:

        make distribution TGTNODE=//n

where n is replaced with the node number of computer to which you wish to distribute.

5: Software Components

The following provides a brief description of some of the files to be found in the home directory.

5.1: Experiment.config

This file defines several key parameters used by many of the programs which make up the data acquisition system. The most important parameters are "Experiment" and "HomeDir" which identify the instrument's name and home directory. All the parameters are defined in the Experiment.config Reference Manual.

5.2: tm.dac

This file is generated during the compilation of the instrument's collection program. It is a small binary file containing a crude specification of the generated telemetry frame. It doesn't contain any of the individual data definitions, just enough information to allow some generic utilities to process the files. The format of the file is specified in the header file /usr/local/include/dbr.h.

5.3: *.pcm

This file provides a much more verbose description of the generate telemetry frame, including the precise location of every datum therein. Of particular interest is the summary information at the top of the file which outlines the basic size and shape of the frame. The "bits/sec" number can be used to precisely calculate how much disk space the instrument will require for a given flight.

5.4: interact

interact, and runfile.* are the names most often given to the shell scripts which start up the data acquisition system on the flight computer. interact specifically implies how the instrument should run when started interactively via a doit script.

5.5: runfile.xxxx

runfile.xxxx is an instrument start-up script that may be enabled for disabled based on the position of configuration switches on the instrument. Since flight systems must ordinarily start taking data on turn-on without operator intervention, this provides a means of disabling automatic start-up in the lab.

5.6: runfile.dflt

runfile.dflt is a start-up script which is run if no appropriate runfile.xxxx script is found. This file cannot be disable by a switch setting (unless the switches select another existing file).

5.7: doit

doit is the generic name given to GSE start-up scripts. Instruments often have more than one of these scripts, allowing more than one view of the instrument(s). This file is compiled from a source file of type .doit. See the MkDoit2 Manual for details on the format of the source file.

Assuming the flight system is powered up and standing by, the data acquisition system can be started by entering the doit command. Alternately, the doit command can be issued before the instrument is powered on in order to redirect its default behaviour. In this case, the command "doit not" is useful to request that the flight system not start up.

If the flight system is up and running, and you need to shut it down but don't have a GSE screen up, you can send a low-level shutdown command via the command "doit stop". This is particularly useful if you've managed to get your GSE and flight system software out of synch so the flight command server rejects your keyboard client's "quit" command.

6: System Maintainance

Here are a number of utilities we've written over the years to allow users to perform some tasks usually reserved for super users.

6.1: osupdate

This utility allows a user to update the system software on their node. From time to time, we write new programs, fix bugs in old programs or add new features. We have found it unadvisable to distribute these updates without users' knowledge, since any changes may have unforseen consequences which may arise at very inconvenient times. Instead, we allow the users to decide when they are ready to deal with changes, and install them using this command. Usually such updates are transparent to the user, but if they've done the udpate themselves and a problem does arise, they are more likely to associate the problem with the update, which will enhance our ability to resolve it.

Whenever an update is performed with osupdate, the results are mailed to the system administrator and the node administrators. This means that you and I both know that a change has occurred, whether you initiated it or I did.

6.2: fixdisk

fixdisk is a thin cover for the QNX chkfsys utility. Ordinarily you need to be root to run chkfsys. fixdisk will let you run it if you are a node administrator.

6.3: settime

Like fixdisk, settime is a thin cover for the QNX date and rtc utilities, allowing node administrators to update their node's system and hardware clocks. See "use settime" for more information.

6.4: flttime

flttime is related to settime, and is used to synchronize the clocks of a flight computer and a GSE computer by setting the flight computer's clock to the current time on the GSE computer. See "use flttime" for more information.

7: Flight Operations

What follows are few guidelines for users during field deployments. This is partly my opinion and partly what I've learned from watching our best scientists in the field.

7.1: Know Your System

Perhaps the most important thing to understand as a scientist on a field mission is that you are ultimately responsible for the successful operation of your instrument. Yes, you have software support, but if the instrument fails, you are the one who has to explain it to the other experimenters. Your best hope is to cultivate a thorough understanding of your system. This will serve you well when you need to make configuration choices at 0400.

7.2: Question Updates

As a rule, change is a dangerous thing during field operations. If it ain't broke, don't fix it. Resolve stylistic issues before reaching the field, or during test flights.

Structure your code so routine algorithm adjustments are localized. Use revision control to document all changes. Use rcsdiff to double-check the changes you've made against the version from the previous flight.

If the change you wish to make is not routine, question carefully whether it is worth attempting. Ask your software support how big the change is, but reserve judgement for yourself.

Be particularly careful when attempting to identify system failure modes and compensate for them. Make certain you are not introducing failure modes rather than eliminating them. If possible, make sure the action will only take effect when you are otherwise guaranteed to get no useful data.

7.3: Test Using Flight Configuration

Be very careful to ascertain that the system will operate in a flight configuration. Starting via "doit" is not good enough, since flight mode usually uses a different start-up script. Disconnect from the network and power up the system. Does it start up alright? After the system is up and running, you can reconnect to see how it's doing.

7.4: Test New Algorithms

Make sure any changes you've made will work by exercising them in a flight configuration. If the changes are intended to compensate for a possible failure mode, be sure to simulate that failure mode to make sure the results are as desired.

8: Troubleshooting

[Data Acquisition Guides]
Copyright 1999 by the President and Fellows of Harvard College