CDF Syntax

Files with a .cdf extension are used to define the extraction of experimental data from the raw data files written by the lgr program to a comma separated value text file format, which is more suitable for data analysis. The edf2ext compiler converts these definitions into TMC code, which is in turn compiled to C++ and linked into an executable which performs the actual extraction.

A .cdf file defines extraction to one or more output files. Each output file must be identified by name and the number of columns. An optional keyword, separate can be added, which is explained later.

csv <name> <n_columns> [ separate ]

After the spreadsheet is identified, the columns to which data is to be extracted are defined:

<column> <mnemonic> [ <format> ] [ <conversion> ]

<column> is the column number, which must be greater than or equal to zero and less than the number of columns in the spreadsheet. Column zero always refers to time, but you can use a column zero definition to specify a different name or output format for the time variable.

<mnemonic> is the datum mnemonic which must correspond to a TMC definition (except in the case of column zero).

<format> is an optional C-style output format to determine how the data should be converted to text. If no format is specified, the extraction uses the same text conversion function used for realtime data display. When a format is specified, the value is first converted to a double using whatever TMC conversion was specified, and then the value is converted to text using printf-style formatting. Therefore the format must be appropriate for a variable of type double.

In order to avoid generating corrupt csv files, all output text is passed through a syntax checker. If the text string is determined to be non-numeric, it is replaced with a string designated for non-numbers. The default non-number string is '' (empty string), which Matlab recognizes as a non-number on input. Other programs may look for a different string (e.g. NaN) or a specific large value (e.g. 99999). These can be specified with the nan-text keyword.

Note that we often use TMC conversions that produce non-numeric output to enhance usability of the data displays. This is a case where specifying an output format is necessary, or all the values will be reported as non-numbers. During the extraction, a warning is issued the first time any column produces a non-numeric value, so reviewing the extract.log file is a good idea at least the first few times you run the extraction.

<conversion> is an optional parameter which allows you to override the conversion defined for the mnemonic's TMC type. This is useful, for example, if you would like to extract the raw, unconverted data or use a more sophisticated conversion than the one used for realtime display. The conversion value is treated as a function name, but the substitution is very crude. For example, to extract unconverted integer values, you could specify:

7 Air_T %.0lf 1.0*

Here the conversion 'function' is '1.0*', which would translate into something like:

printf("%.0lf", 1.0*(Air_T));

Special Features

nan-text is used to specify the string used to replace non-numeric text in the output file. The command is placed on a line by itself with the unquoted string following it. For example:

nan-text NaN

At any point in the file, the keyword init_only may be specified to suppress the generation of the code which places each data point into the output file. This is useful when you wish to take more direct control of how data is written to the file or when multiple numbered files are required from a single specification (e.g. one for each scan).

Another method for controlling what data goes into a file is the condition keyword. This may be placed on the line immediately following the csv line and includes a conditional statement which defines which values should be included. For example:

csv foo 5 condition if ( status == 7 )

csv foo 5 condition depending on ( SCANNING )

The optional separate keyword indicates that each column should be inserted when it is received independent of the rates of the other data in the file. By default, data is inserted into the file at the rate of the slowest channel. When separate is in effect, slower-rate channels will have blank cells when the higher-rate data is inserted. This is rarely useful. More commonly, data is segregated by rate and then combined as appropriate during subsequent analysis.

Return to Manuals Guide

last updated: Wed Jul 18 15:10 2012	webmaster@huarp.harvard.edu
Copyright 1995 by the President and Fellows of Harvard College