CladoCode
Automated homogeneous subset analysis for cladistics
CladoCode is an online implementation of homogeneous subset coding
described by Simon (1983). This method represents a repeatable,
non-arbitrary method of coding metric data for phylogenetic analysis
(Rae, 1998).
The program tests for homogeneous variances between samples, runs the
appropriate multiple comparisons test, creates homogeneous subsets,
then provides alphanumeric codes for each taxon for each variable. The
coded data matrix and the accompanying statistical results are
displayed on a webpage that can be saved on the local computer. If
this is your first time using CladoCode, please read the file format
information below before proceeding.
File format
To use the utility, you must provide a plain-text data file with
space-separated or tab-separated columns. Most spreadsheet and database
programs provide this as an option under File > Save as....
An example file is linked here.
The input file format is crucial — the program returns a blank screen
if it is not formatted correctly. Check your text file after export to see
that it conforms to the following key elements before you attempt to upload it:
-
The first column must be the names of the taxa. Any block of whitespace will
be treated as a column delimiter, so avoid spaces in the names: use an underscore
or CamelCase convention instead. Also avoid use of reference schemes such as
"Taxon_cf." or "Group_spp." — these will be treated not as references, but
as defining a new taxon with that exact name.
-
The first row must include a descriptor of the first column (e.g., "Taxa", or "Species", or
"Populations"), then descriptors of the traits (e.g., "Trait1", or "femur_length", or "BMI"),
each separated by any non-zero amount of whitespace.
-
All further rows represent individuals, with the taxon name and the data points all
separated by whitespace. You can use multiple spaces or tabs to visually align columns
if you like.
-
Every line must end with a carriage return.
-
There must be no missing data — more whitespace does not indicate a missing column,
and any row without enough entries will result in a blank screen. You will need to separate
your data set into multiple input files if data are missing for any individuals.
It is assumed that the data are properly scaled, so as to reduce the influence
of size (Rae, 2002).
Code
The code of this utility is available as a Python
module and scripts for command-line and web-based use:
download package as zip file.
Citation
If you use the utility, please cite the following:
References
- Rae, T., 1998. The logical basis for the use of continuous characters
in phylogenetic systematics. Cladistics 14(3), 221-228.
- Rae, T., 2002. Scaling, polymorphism, and cladistic analysis.
In: Forey, P., MacLeod, N. (Eds.), Morphology, Shape & Phylogenetics.
Taylor & Francis, London, pp. 45-52.
- Simon, C., 1983. A new coding procedure for morphometric data with an
example from periodical cicada wing veins. In: Felsenstein, J. (Ed.),
Numerical taxonomy. Springer-Verlag, Berlin, pp. 378-382.
© Andy Buckley and Todd C. Rae, all rights reserved.