return to the HPGL site

Microsat



Microsat is no longer supported nor under development, and is provided here merely as a convenience.

We cannot answer any questions regarding its use, nor provide support of any kind.


Brief instructions for using microsat, the microsatellite distance program.

Instructions for obtaining microsat

Microsat source code and executables are available here.

Microsat data is available here.

The current version includes the following new features:

Some helpful researchers have volunteered executable versions of microsat, compiled for various hardware/OS combinations. [Disclaimer: this does not constitute a promise on their part or on mine to support these executables or to fix problems; they are provided at this site only to make distribution simpler and to make use of the program easier for those who have no way of compiling the program source]:
ResearcherAffiliationContactVersionPlatform (Compiler)
Eric Minch     1.5dDOS (djcpp)
John TaylorSimon Fraser University, Canadajtaylora@sfu.ca1.5bMacintosh LC630
Doug Call and James HallettWashington State University, USAEric Minch1.5bDOS
Monique Fountain     1.4dMacintosh Quadra 700
Alex ParkerUniversity of Maine, USAAPARKER@MAINE.maine.edu1.4dDOS (huge memory model)
Nicole PernaUniversity of New Hampshire, USAPERNA@colsa.unh.edu1.4dMacintosh Quadra 950
Christian SchloettererUniversity of Vienna, Austriaschlotc@hp01.boku.ac.at1.4dMacintosh Powerbook 145
You can also get sample data and output files to test your version.
1The algorithm used is the unbiased standardized estimate from Simon Goodman, whose RstCALC program is available from a website. Goodman's method will be included in the distance page once the paper describing its derivation has been accepted.

Instructions for running microsat

The microsat program is written in ANSI C and is self-contained, i.e., it requires only the standard routines in stdio, stdlib, math, string, and ctype. Selection of options controlling the program is via simple tty menus. The format of the input data is
<taxon> <locus> <repeatlength> <frequency>
(where repeatlength can be number of repeats or nucleotide length, and frequency is the number of occurrences, i.e., absolute frequency) or
<taxon> <locus> <repeatlength>
(with implied absolute frequency=1) for m loci and n taxa. A "taxon" can be either a population or an individual. The program ignores blank lines and comment lines (any that start with '%'). Example data and output files are available. The following options are currently supported:
  1. Calculations can take various aspects of the allele sizes into account:
  2. Input files can be checked for missing data (taxon-locus combinations with no data), anomalous frequencies (taxon-locus combinations with odd-numbered frequencies), incommensurable taxa (taxon pairs with no loci in common), or taxon-specific alleles (alleles occurring in only one taxon, or in only one taxon of a taxon pair).
  3. Outliers may be detected and eliminated by either of two methods; each has a default value for the multiplicative coefficient, which may be overridden.
  4. The distance measure to be calculated may be any one of the following:
  5. If distance measures are being calculated, they can be performed over multiple bootstraps. In this case, the distance matrix reported will be the average over all bootstraps, and it will be followed by a matrix of the standard errors of the distances.
  6. Distance measures can be calculated separately for each locus.
  7. Estimates of duration of linearity can be calculated for each locus, and averaged over all loci. If linearity is being estimated:
  8. Values for Fst (by variance and by heterozygosity methods), standardized Rst1, and average and total values for heterozygosity, variance, number of alleles, allele size range, maximum allele size, and entropy of allele size distribution can be calculated by locus or by taxon.
  9. Crosstabulation of the number of data values found in each taxon-locus combination.
  10. Frequency distribution: mean, median, mode, minimum, and maximum allele size, and the frequency of each allele, by locus or by taxon.


If you wish to test your compiled version of the program, the two sample data files test.dat1 and test.dat2 yield the results shown in test.out. The two sample data files contain exactly the same data; the first is in the first input format, i.e., with explicit absolute frequencies, while the second is in the second input format, i.e., with no explicit frequencies.