Files with a .cdf
extension are used to define
the extraction of experimental data from the raw data files
written by the lgr
program to a comma separated
value text file
format, which is more suitable for data analysis. The
edf2ext
compiler converts these definitions into TMC
code, which is in turn compiled to C++ and linked into an
executable which performs the actual extraction.
A .cdf
file defines extraction to one or more
output files. Each output file must be identified by name and the
number of columns. An optional keyword, separate
can
be added, which is explained later.
csv <name> <n_columns> [ separate ]
After the spreadsheet is identified, the columns to which data is to be extracted are defined:
<column> <mnemonic> [ <format> ] [ <conversion> ]
<column>
is the column number, which must be
greater than or equal to zero and less than the number of columns in the
spreadsheet. Column zero always refers to time, but you can use a column
zero definition to specify a different name or output format for the
time variable.
<mnemonic>
is the datum mnemonic which must
correspond to a TMC definition (except in the case of column zero).
<format>
is an optional C-style output format to
determine how the data should be converted to text. If no format
is specified, the extraction uses the same text conversion function
used for realtime data display. When a format is specified, the
value is first converted to a double using whatever TMC conversion
was specified, and then the value is converted to text using printf-style
formatting. Therefore the format must be appropriate for a variable
of type double.
In order to avoid generating corrupt csv files, all output text is passed through a syntax checker. If the text string is determined to be non-numeric, it is replaced with a string designated for non-numbers. The default non-number string is '' (empty string), which Matlab recognizes as a non-number on input. Other programs may look for a different string (e.g. NaN) or a specific large value (e.g. 99999). These can be specified with the nan-text keyword.
Note that we often use TMC conversions that produce non-numeric output to enhance usability of the data displays. This is a case where specifying an output format is necessary, or all the values will be reported as non-numbers. During the extraction, a warning is issued the first time any column produces a non-numeric value, so reviewing the extract.log file is a good idea at least the first few times you run the extraction.
<conversion>
is an optional parameter which
allows you to override the conversion defined for the mnemonic's
TMC type. This is useful, for example, if you would like to
extract the raw, unconverted data or use a more sophisticated
conversion than the one used for realtime display. The conversion
value is treated as a function name, but the substitution is
very crude. For example, to extract unconverted integer values,
you could specify:
7 Air_T %.0lf 1.0*
Here the conversion 'function' is '1.0*', which would translate into something like:
printf("%.0lf", 1.0*(Air_T));
nan-text
is used to specify the string used to
replace non-numeric text in the output file. The command is placed
on a line by itself with the unquoted string following it. For
example:
nan-text NaN
At any point in the file, the keyword init_only
may
be specified to suppress the generation of the code which places
each data point into the output file. This is useful when you
wish to take more direct control of how data is written to the
file or when multiple numbered files are required
from a single specification (e.g. one for each scan).
Another method for controlling what data goes into a file
is the condition
keyword. This may be placed on the
line immediately following the csv
line and includes
a conditional statement which defines which values should be
included. For example:
csv foo 5
condition if ( status == 7 )
or
csv foo 5
condition depending on ( SCANNING )
The optional separate
keyword indicates that each
column should be inserted when it is received independent of the
rates of the other data in the file. By default, data is
inserted into the file at the rate of the slowest channel.
When separate
is in effect,
slower-rate channels will have blank cells when the higher-rate
data is inserted. This is rarely useful. More commonly, data is
segregated by rate and then combined as appropriate during
subsequent analysis.
Return to Manuals Guide
last updated: Wed Jul 18 15:10 2012 | webmaster@huarp.harvard.edu |
Copyright 1995 by the President and Fellows of Harvard College |