ICOSfit Manual

Contents:

Synopsis

    icosfit configfile

Introduction

ICOSfit is a non-linear least-squares fit algorithm originally developed for processing ICOS data from the Anderson Group's ICOS instrument.

Availability

The icosfit source code is maintained in a subversion repository on Harvard's ABCD Forge server. A binary distribution is available for use under Cygwin, and a source tarball is available for other platforms.

Invocation

icosfit is invoked from the command line with the name of a configuration file. All options are specified within that configuration file.

Options

All options are specified within the configuration file.

Input/Output Options

The input/output options define where the input files are to be found and where the output files should be placed.

PTEFile = Path;
PTFile = Path; Deprecated
PandTFile = Path; Deprecated
These three options are different ways of specifying where the pressure and temperature data should be found. 'PTFile' and 'PandTFile' refer to deprecated input formats. 'PTEFile' refers to a file that includes etalon fit data. The default is: PTEFile = PTE.txt;, but you will almost always want to specify this parameter explicitly.
ICOSdir = Path;
Specifies the base directory under which the scan directory structure is stored. The default is 'Scans', but you will usually need to specify this explicitly.
Binary;
ASCII;
These define whether the ICOS input files were written in binary or ASCII format. The default is binary.
BaselineFile = Path [ + Input ];
File containing the baseline basis vectors. The format is defined by the writeskewbase matlab routine.
OutputDir = Path;
Specifies the directory into which all the output files will be written. The default is ICOSout.
OutputFile = Path
Specifies the filename for the main fit results file. The default is ICOSsum.dat. If the specified filename does not include an absolute path, it will be located in the OutputDir.
LogFile = Path
Specifies the filename for the fit log file. The default is ICOSfit.log. As with OutputFile, the LogFile will be located in the OutputDir unless an absolute path is specified.
MFile = Path
Specifies the name of the MATLAB .m file which summarizes some of the configuration parameters for use in analyzing the results. The default name is ICOSconfig.m, and it will also be located in the OutputDir unless an absolute path is specified.

Line Definitions

Since the primary purpose of icosfit is to fit absorption lines, defining what lines we are trying to fit is an important part of the configuration. Each configuration file should have a line definition block containing zero or more line definitions:

Lines = {
 1 4  1332.015310  1.464e-23 0.0918   221.9461 0.78 -0.002000 344, Position=1518;
 6 1  1332.085300  5.760e-20 0.0641   104.7800 0.75 -0.003000 577, Position=1453;
 1 2  1332.191200  3.239e-25 0.0865   839.5495 0.70 -0.000260 344, Position=1353;
};

Each line definition line consists of nine numbers followed by zero or more optional elements. The nine numbers are taken almost directly from the HITRAN database. Details of the applciation of the line parameters can be found in the Appendix to the 1998 JQSRT Paper. They are listed below with the column number from the HITRAN output listed in parentheses:

  1. HITRAN Molecule Number (col 1)
  2. HITRAN Isotopomer Number (col 1)
  3. nu: Wavenumber of the spectral line transition (col 2)
  4. S: Spectral line intensity at 296K. (col 3)
  5. Gair: Air-broadened halfwidth at half maximum [cm-1/atm] at Tref=296K and reference pressure Pref=1atm (col 5)
  6. E: The lower state energy of the transition [cm-1] (col 7)
  7. n: coefficient of temperature dependence of the air-broadened halfwidth. (col 8)
  8. delta: air-broadened pressure shift [cm-1/atm] at Tref=296K, Pref=1atm. (col 9)
  9. ierr: Accuracy indices for frequency, intensity and Gair. (col 14)

The optional elements are added after these nine numbers, each preceeded by a comma. They are:

Position = SampleNumber;
Specifies the initial position of the line in sample space.
Threshold = value;
Specifies the line strength threshold value. If the fitted line strength falls below this threshold, the position and widths of the line will be made fixed parameters until the line strength rises above the threshold. By line strength here we mean the product of all the values that are multiplied by the voigt line shape. The default threshold is 1e-4, which is fairly conservative. Under good conditions, the threshold can be lowered considerably below that. Lines which are located on top of other lines may need to have their thresholds raised considerably in order to keep their fits from wandering.
Fix Doppler;
Fix Lorentz;
Fix FinePosition;
Float Doppler;
Float Lorentz;
Float FinePosition;
These options specify whether the Doppler or Lorentz widths or the FinePosition for this line should be fixed or fit. The default is to float both widths and fix the FinePosition.

Except for Position, these options can also be specified globally and then modified for individual lines.

Fitting Options

The fitting options are loosely grouped into the following sub-categories:

Fitting Range

BackgroundRegion = IntegerPair;
Specifies the sample range over which the ICOS signal should be averaged to determine the zero offset. Defaults to [10,200].
SignalRegion = IntegerPair;
Specifies the limits of the sample range over which the fit will be performed. Defaults to [350,1750].
ScanNumRange = IntegerPair;
Specifies the range of scans to be fit. Defaults to all scans, but you will frequently need to specify a smaller range. The backwards compatible keyword 'CPCI14Range' can also be used.
QCLI_Wave = Integer;
Specifies the waveform number (defined implicitly in the qclicomp input file, explicitly in the qclicomp .cmd output file). All scans for a particular run of icosfit must be of the same waveform.
Restart at Integer;
A shortcut method for re-running a fit when the fit parameters have not changed and don't need to be reinitialized. Restart reads the existing output file and uses it to initialize the fit parameters. This is useful, for example, if the fit went bad at a particular scan, and you want to retry from a few scans earlier after raising the threshold for a particular line. The output file is truncated at the restart scan number and subsequent fits are appended to the file.
Restart at Integer PreserveOutput;
This is identical to the previous format, but the new fits do not overwrite the old fit values. This is useful, for example, when you simply want to increase the verbosity of a fit for some subset of scans.

Hardware Configuration

MirrorLoss = FloatVal ppm;
Specifies the mirror loss determined by ringdown.
CavityLength = FloatVal cm;
Length of the reflecting portion of the cavity in cm. Specifying the units is optional, but 'cm' is the only unit supported. For Herriott cell configurations, this value is multiplied by N_Passes and added to CavityFixedLength to determine the full absorption path length.
CavityFixedLength = FloatVal;
(Added in V2.9) For Herriott cell configurations, the portion of the absorption path length outside the cell. This parameter is not used for ICOS configurations. Defaults to 0.
SampleRate = FloatVal Frequency;
If specified, SampleRate activates the skew fit algorithm. Defaults to 0 Hz.
SkewTolerance = FloatVal ppm;
SkewTolerance is the amount of error that is tolerable during the skew fit. (See the full skew fit document for more details.) Defaults to 10 ppm.
N_Passes = Integer;
Values greater than zero trigger the use of a Herriott Cell model as opposed to an ICOS model. Values for MirrorLoss, SampleRate and SkewTolerance are ignored. Defaults to 0.

Tuning Rate

EtalonFSR = FloatVal cm-1;
Specifies the free spectral range of the diagnostic etalon. Defaults to 0.019805. Reported concentrations scale linearly with EtalonFSR, so errors in EtalonFSR will propogate. It should be possible to diagnose such errors if the reported positions of free lines show a systematic error which varies linearly with wavenumber. Such a diagnostic is likely to make sense only when skew is taken into account.
MinimumFringeSpacing = FloatVal;
Specifies the number of samples between diagnostic etalon fringes at the start of the SignalRegion. This is used to initialize the filter used to pick out the etalon peaks. Defaults to 12.
DSFRLimits = [ FloatVal , FloatVal ];
DSFR is the ratio of number of samples between two fringes and the number of samples between the previous two fringes of the diagnostic etalon. DSFRLimits defines the range of acceptable values for this ratio, defaulting to [.95,1.21]. DSFR values near .5 or 2 are an indication that an spurious fringe was identified or a fringe was missed, and therefore that the determined tuning rate is seriously suspect and that the current scan should be discarded.
TuningRate = FloatVal cm-1/sample;
Specifying tuning rate as a constant disables the use of the diagnostic etalon. This is useful when testing fits of synthetic spectra or when working with a system that has a linear tuning rate and no diagnostic etalon.

Fit Controls

TolerableDrift = FloatVal cm-1;
The amount of drift allowed to a line before increasing its threshold to fix its position. Defaults to .01 cm-1.
LineMargin = FloatVal cm-1;
LeftLineMargin = FloatVal cm-1; (v 2.12)
RightLineMargin = FloatVal cm-1; (v 2.12)
The amount of sample beyond LineMarginMultiplier half-widths that should be include or excluded from the fit. Defaults to .05 cm-1.
LineMarginMultiplier = FloatVal;
LeftLineMarginMultiplier = FloatVal; (v 2.12)
RightLineMarginMultiplier = FloatVal; (v 2.12)
(Added in V2.9) Specifies the value by which the line width is multiplied to determine whether or not to include the line in the fit. If the scaled width is entirely within the signal region, then all the line parameters can float. If the the scaled width is only partially within the signal region, the width and fine position parameters are fixed, but number density is still allowed to float. If the scaled width is entirely outside the signal region, the line is turned off completely and excluded from the fit.
If a line is included in the fit, then we try to extend the sample region to include the scaled width plus the LineMargin. If a line is excluded from the fit, then we try to reduce the sample region as necessary to exclude the scaled width plus the LineMargin. Since we cannot always do both, when there is a conflict, the mandate to include signal region takes priority.
LineMarginMultiplier defaults to 8.
Sigma = FloatVal;
The uncertainty of the raw measurement. This is used in the calculation of chi-squared, but does not otherwise affect the fit.
FitFunction = Ident;
This option allows the selection of a few alternate fitting functions. Check with Norton to see what fitting functions are available. At the time of this writing, specifying SampleRate uses the best fit.
ConvergenceStep = FloatVal; (v 2.15)
The relative reduction in χ2 in a single step that indicates the function is converging. The default value is 1e-3, which was the value used in all versions before v 2.15.
ConvergenceCount = Integer; (v 2.15)
The fit will terminate when there are ConvergenceStep iterations where the relative reduction in χ2 is less than ConvergenceStep. The default value is 4, as it was in all versions prior to v 2.15. Numerical Recipes suggests that 1 or 2 might be reasonable. values.
MaxIterations = Integer; (v 2.15)
Defaults to 500, which was the value used prior to v 2.15. The fit will terminate after MaxIterations, if not before, and continue to the next scan with a warning in the log. In a future version, the number of iterations executed will be reported in the standard output file.

Configuration File Syntax

The following grammar is specified in the syntax of the Extended Backus Naur Form or EBNF. This shows you what you can legally specify in a configuration file. Discussion of what these things actually do will follow.

ConfigFile : ConfigLine+ .
ConfigLine : ConfigCmd ';' .
ConfigCmd :
  Linedef /
  BackgroundRegion = IntegerPair /
  SignalRegion '=' IntegerPair /
  ScanNumRange '=' IntegerPair /
  Restart at Integer opt_preserve /
  FitFunction '=' Ident /
  MirrorLoss '=' FloatVal opt_ppm /
  EtalonFSR '=' FloatVal opt_cm /
  MinimumFringeSpacing '=' FloatVal /
  DSFRLimits '=' '[' FloatVal ',' FloatVal ']' /
  TolerableDrift '=' FloatVal opt_cm /
  LineMargin '=' FloatVal opt_cm /
  LeftLineMargin '=' FloatVal opt_cm /
  RightLineMargin '=' FloatVal opt_cm /
  LineMarginMultiplier '=' FloatVal /
  LeftLineMarginMultiplier '=' FloatVal /
  RightLineMarginMultiplier '=' FloatVal /
  CavityLength '=' FloatVal /
  Sigma '=' FloatVal /
  TuningRate '=' FloatVal opt_tunerate /
  QCLI_Wave = Integer /
  SampleRate = FloatVal Frequency /
  SkewTolerance = FloatVal opt_ppm /
  ConvergenceStep = FloatVal /
  ConvergenceCount = Integer /
  MaxIterations = Integer /
  Threshold /
  FixLineParam /
  Binary / ASCII /
  ICOSdir '=' Path /
  PTEFile '=' Path [ '+' 'nu_F0' ] [ '+' 'PowerParams' ] [ '+' 'MirrorLoss' ] /
  PandTFile '=' Path / Deprecated
  PTFile '=' Path / Deprecated
  BaselineFile '=' Path [ + Input ] /
  OutputDir '=' Path /
  OutputFile '=' Path /
  LogFile '=' Path /
  MFile '=' Path /
  Verbosity '=' Integer .
Path : PathStr / Ident .
IntegerPair : '[' Integer ',' Integer ']' .
opt_ppm : / '%' / ppm .
opt_cm : / 'cm-1' .
opt_preserve : / PreserveOutput .
opt_tunerate : / 'cm-1' '/' sample .
Frequency : / 'Hz' / 'kHz' / 'MHz' .
Threshold : Threshold '=' FloatVal .
Position : Position '=' Integer .
Linedef : Lines '=' '{' Line* '}' .
Line : Integer Integer FloatVal FloatVal FloatVal FloatVal
  FloatVal FloatVal Integer LineOpts ';' .
LineOpts : ( ',' LineOpt )* .
LineOpt : Threshold / Position / FixLineParam .
FixLineParam : FixFloat LineParam .
FixFloat : Fix / Float .
LineParam : Lorentz / Doppler / FinePosition .
FloatVal : Float / Integer / '-' FloatVal .

Terminals

Float : A floating point constant as defined in C
Integer : A string of decimal digits
Ident : A string of letters, digits and underscores
PathStr : A filename path

File Formats

ICOS Files

ICOS files are located in the directory hierarchy specified under the ICOSdir statement. They are written into multiple directories according to the conventions of the multi-level file subroutines of nortlib. Binary and ASCII formats are supported, as specified by the Binary and ASCII statements.

The ASCII format consists of two columns, space-delimited. The first column is the ICOS data, and the second column is the diagnostic etalon data. The sample number is implicit, starting with 1.

The binary format is preferred, since it takes up less disk space and is now generated by default.

FormatvaluesNameDescription
uint321nsamplesThe number of rows in the file.
uint321ncolumnsThe number of columns in the file.
float32nsamplesCol1The data for column 1
float32nsamplesCol2The data for column 2
...
float32nsamplesColnThe data for column ncolumns

At the moment, the number of columns is 2, with the first column being the ICOS data and the second column being the diagnostic etalon data. As with the ASCII format, the sample number is implicit, starting with 1. uint32 is an unsigned 32-bit integer in little-endian format. float32 is a 32-bit floating point value in little-endian IEEE format.

SSP File Format

Newer versions of the SSP driver include the full SSP scan header in each file. This format can be automatically distinguished from the ICOS File Format, so no distinction is required in the configuration files.

FormatvaluesNameDescription
uint320x00010006FormatIdentifies this format
uint81NChannelsThe number of columns in the file minus one.
uint81NFSample Clock Divider
uint161NSamplesThe number of rows in the file.
uint161NCoaddThe number of scan averaged.
uint161NAvgThe number of samples averaged.
uint161NSkLThe number scans dropped before final coadd
uint161NSkPThe number scans dropped before initial coadd
uint321SerialNumThe serial number of the final scan in coadd
uint161T_HtSinkHeat Sink Temperature
uint161T_FPGAFPGA Temperature
uint321SSPStatusSSP Status Flags
float32nsamplesCol1The data for column 1
float32nsamplesCol2The data for column 2
...
float32nsamplesColnThe data for column ncolumns

PTE File Format

PTEFile = Path [ '+' 'nu_F0' ] [ '+' 'PowerParams' ] [ '+' 'MirrorLoss' ] ;

The PTEFile statement specifies the file containing per-scan input parameters. The primary data is stored in the first 11 columns:

ColumnNameDescriptionUnits
0ScanNumScan File Index
1CellPCell PressureTorr
2TavgCell TemperatureK
3Etln[0]Sample Number OffsetSamples
4Etln[1]Constant Coef.fringes
5Etln[2]Linear Coef.fringes/Ksamples
6Etln[3]Quadratic Coef.fringes/Ksamples^2
7Etln[4]1st exp amp.fringes
8Etln[5]1st exp tauKsamples
9Etln[6]2nd exp amp.fringes
10Etln[7]2nd exp tauKsamples

[Introduced in V2.20]: The PTEFile may also contain additional parameters in additional columns. Three documented parameters are currently supported: nu_F0, PowerParams and MirrorLoss. These parameters must be specified in same order as they appear in the file, although any ordering is supported. Additional columns will be ignored.

[Introduced in V2.20]: '+ nu_F0' specifies one additional column which will contain a fixed value for the nu_F0 parameter. This is useful when fitting weak lines through periods of very low concentration where the line positions are changing gradually. At low concentrations, the nu_F0 parameter is usually fixed, which can allow the line position to drift away. After fitting, possibly in separate regions, it is possible to interpolate reasonable values for nu_F0, write them into the PTEFile, add the + nu_F0 option, and icosfit will use the specified values for nu_F0 without floating the parameter.

ColumnNameDescriptionUnits
N+1νF0Wavenumber of fringe zerocm-1

[Introduced in V2.20]: '+ PowerParams' specifies seven additional columns which will contain additional parameters from the etalon fit. These values are not used by icosfit, but it is convenient to have etln_fit write them into the PTEFile for other related analyses. Specifying etln_fit( ..., 'SAVEALL', 1) produces these additional columns. It also includes a column of zeros as a placeholder for + nu_F0, so + PowerParams should always be preceeded by + nu_F0.

The columns introduced by + PowerParams are:

ColumnNameDescriptionUnits
N+1Etln[8]Etalon Power Cubic Coef.Power Ksamples-3
N+2Etln[9]Etalon Power Quadratic Coef.Power Ksamples-2
N+3Etln[10]Etalon Power Linear Coef.Power Ksamples-1
N+4Etln[11]Etalon Power Constant Coef.Power
N+5Etln[12]Etalon Finesse
N+6N_PassesNumber of passes required by fit
N+7Qualitystd(residual)/(max(etln)-min(etln))

[Introduced in V2.20]: '+ MirrorLoss' specifies one additional column containing the MirrorLoss value for each scan. This overrides the MirrorLoss parameter in the configuration file and can be used when the mirror loss can be derived from the end of each ICOS scan. The MirrorLoss column is unitless, so it should include the relevant scaling factor (e.g. ppm).

ColumnNameDescriptionUnits
N+1LMirror Lossunitless

PT File Format

PTFile = Path;

The PTFile keyword and file format are deprecated, and support will be removed in a future version.

The PTFile statement identifies the file which contains the pressure and temperature data for the scans. This data is in a text format, and is generated via a TMC extraction (PText) and an SNAFU text output script (totext.snf). The contents are:

ColumnNameDescriptionUnits
0TimeSeconds since 00:00:00 1/1/70 UTCsec
1CellPCell PressureTorr
2TavgCell TemperatureK
3ScanNumScan Number
4Cal_FCalibration FlowSCCM
5InltFInlet FlowSCCM
6RORIS1=>Ringdown, 0=>ICOS
7RateS1=>50 Hz, 0=>10Hz

As of icosfit V1.9 and CR VERSION 1.3, RORIS has been replaced with QCLI_Wave, an integer value indicating which waveform is currently being sampled. This is backwards compatible, since the default value of zero is consistent with the acceptable value for RORIS.

PandT File Format

PandTFile = Path;

The PandTFile keyword and file format are deprecated, and support will be removed in a future version.

The PandTFile statement specifies the use of an earlier file format for pressure and temperature. The PT format is preferred.

ColumnNameDescriptionUnits
0TimeSeconds since 00:00:00 1/1/70 UTCsec
1CellPCell PressureTorr
2Gas1TCell TemperatureCelcius
3Gas2TCell TemperatureCelcius
4Gas3TCell TemperatureCelcius
5Gas4TCell TemperatureCelcius
6CPCI14Scan Number
7CPCI16Scan Number (non-existant)
8Cal_FCalibration FlowSCCM
9InltFInlet FlowSCCM
10RORIS1=>Ringdown, 0=>ICOS
11RateS1=>50 Hz, 0=>10Hz

Baseline File Formats

BaselineFile = Path [ + Input ];

There are currently two baseline fitting functions, each providing considerable flexibility in how the baseline is approximated. Selection between the two functions is determined by the format of the specified BaselineFile. If the optional '+ Input' is included, the third channel in each input file is treated as a baseline input vector with a free scaling parameter.

func_base_svdx
This function is indicated by specifying a BaselineFile in standard ICOS binary format. Each column is taken as a baseline basis vector over the raw sample space. The derived baseline is a linear combination of these basis vectors. The generic files sbase.linear and sbase.cubic are useful starting points, providing linear and cubic approximations respectfully.

As the name implies, you can perform singular value decomposition on baseline data in the absence of absorption to generate appropriate baseline vectors. The Matlab functions writebase and writeskewbase are used to output the vectors in the correct format.
func_base_ptbnu
This function is indicated by specifying a file output by the Matlab function writeetlnbase. The file format supports basis vectors that are a function of wavenumber rather than sample number, plus a polynomial that is a function of either wavenumber or sample number.

Output Files

ICOSfit produces several output files:

ICOSconfig.m

ICOSconfig.m is a human-readable Matlab script that defines a number of variables relevant to the fit. If any format change is made to ICOSsum.dat, a corresponding change should be made to ICOSconfig.m to guarantee that the support utilities will be able to correctly interpret data both before and after the change.

ICOSconfig.m is generated in the source file ICOSmain.c and it is processed by ICOSsetup.m and ICOS_setup.m.

ICOSsum.dat

ICOSsum.dat contains the results of the fit, with one line per scan. It is generated in the function fitdata::lwrite() in the source file fitfunc.c and it is processed by ICOSsetup.m and/or ICOS_setup.m. In the absence of more documentation, refer to those sources to see how the file is produced and processed.

Author

Norton Allen


Return to Manuals Guide


last updated: Sun Feb 1 2015 10:15 EST webmaster@huarp.harvard.edu
© 2015 by the President and Fellows of Harvard College