Automated homogeneous subset analysis for cladistics

CladoCode is an online implementation of homogeneous subset coding described by Simon (1983). This method represents a repeatable, non-arbitrary method of coding metric data for phylogenetic analysis (Rae, 1998).

The program tests for homogeneous variances between samples, runs the appropriate multiple comparisons test, creates homogeneous subsets, then provides alphanumeric codes for each taxon for each variable. The coded data matrix and the accompanying statistical results are displayed on a webpage that can be saved on the local computer. If this is your first time using CladoCode, please read the file format information below before proceeding.

File format

To use the utility, you must provide a tab-delimited plain text file. Most spreadsheet and database programs provide this as an option under File > Save as....

The file must be formatted with the names of the taxa in the first column — all taxa with different names will be treated as discrete, so avoid cf. or Genus spp., etc. The first row must be the names of the traits to be coded.

Below is an example of the format: Taxon Trait 1 Trait 2 Trait 3 genus1 50 75 25 genus1 55 70 26 genus2 45 60 125...etc

It is assumed that the data are properly scaled, so as to reduce the influence of size (Rae, 2002).


The code of this utility is available as a Python module and scripts for command-line and web-based use: download package as zip file.


If you use the utility, please cite the following:

Rae, T., Buckley, A., CladoCode, 2009, arXiv:1411.4550 [stat.AP]


© Andy Buckley and Todd C. Rae, all rights reserved.