Conceived and designed the experiments: TW. Performed the experiments: KL. Analyzed the data: TW KL CRL. Contributed reagents/materials/analysis tools: KL. Wrote the paper: KL TW. Suggested statistical tests: CRL.
The authors have declared that no competing interests exist.
Statistical methods for phylogeny estimation, especially maximum likelihood (ML), offer high accuracy with excellent theoretical properties. However, RAxML, the current leading method for large-scale ML estimation, can require weeks or longer when used on datasets with thousands of molecular sequences. Faster methods for ML estimation, among them FastTree, have also been developed, but their relative performance to RAxML is not yet fully understood. In this study, we explore the performance with respect to ML score, running time, and topological accuracy, of FastTree and RAxML on thousands of alignments (based on both simulated and biological nucleotide datasets) with up to 27,634 sequences. We find that when RAxML and FastTree are constrained to the same running time, FastTree produces topologically much more accurate trees in almost all cases. We also find that when RAxML is allowed to run to completion, it provides an advantage over FastTree in terms of the ML score, but does not produce substantially more accurate tree topologies. Interestingly, the relative accuracy of trees computed using FastTree and RAxML depends in part on the accuracy of the sequence alignment and dataset size, so that FastTree can be more accurate than RAxML on large datasets with relatively inaccurate alignments. Finally, the running times of RAxML and FastTree are dramatically different, so that when run to completion, RAxML can take several orders of magnitude longer than FastTree to complete. Thus, our study shows that very large phylogenies can be estimated very quickly using FastTree, with little (and in some cases no) degradation in tree accuracy, as compared to RAxML.
Phylogeny estimation is an important part of much biological research. Methods (either Bayesian or maximum likelihood) based upon stochastic models of sequence evolution have many desirable statistical properties, but are also computationally the most challenging. Bayesian MCMC methods (e.g., MrBayes
Although continuous enhancements are being added to RAxML, its computational requirements can still be prohibitive for alignments with more than a few thousand sequences and sites (e.g., a RAxML analysis of several alignments of a 16S dataset with almost 28,000 sequences required approximately a month of CPU time
In this paper, we compare RAxML and FastTree on nucleotide datasets, when alignments must be estimated. We explore performance with respect to running time, ML score, and topological accuracy, using both biological and simulated datasets and estimating alignments using several different methods. Because running time is a crucial issue for large datasets, we include a variant of RAxML (which we cal “RAxML-Limited”), in which we constrain RAxML's running time so that it is not substantially longer than FastTree's.
Our study shows that in many cases, phylogenetic analyses of very large nucleotide alignments can be performed using FastTree without a substantial difference in tree accuracy, and in a small fraction of the time needed by RAxML. Thus, FastTree represents an important contribution achievement in the state of the art for ML tree estimation on nucleotide sequence alignments.
We compared RAxML, RAxML-Limited, and FastTree on 1800 1000-taxon alignments, previously studied in
Throughout these experiments we observed the following. First, for almost all model conditions and alignment methods, RAxML-Limited produces the least accurate ML scores and tree topologies of all three methods, with results that are generally statistically significant (
The 1000-taxon model conditions are arranged along each x-axis from left to right in order of increasing difficulty. Standard error bars are shown.
A comparison between RAxML and FastTree (
However, even on the harder models, the differences were small, and not all models show statistically significant differences. On the more accurate alignments (i.e., the true, SATé, and MAFFT alignments), all the model conditions showing statistically significant differences between RAxML and FastTree favored RAxML. The average improvement on these model conditions was 0.5% on the true alignment, 0.5% on the SATé alignment, and 1.1% on the MAFFT alignment. Thus, although there were statistically significant differences, their magnitudes were small.
On the less accurate alignments, we see some interesting differences. On the ClustalW alignments, nine of the ten harder model conditions showed statistically significant differences between RAxML and FastTree: two showed RAxML having an advantage over FastTree (but with the average improvement only 0.4%), and seven showed FastTree having an advantage over RAxML (average improvement 2.1%). On the Quicktree alignments, eight of the ten model conditions showed statistically significant differences, with three in favor of FastTree (average improvement 1.2%) and five in favor of RAxML (average improvement 1.1%). Finally, on the PartTree alignments, eight of the ten harder model conditions showed statistically significant improvements, all in favor of FastTree (average improvement 0.6%).
Thus, with respect to tree topology accuracy, the relative performance of RAxML and FastTree depended upon both the model parameters and alignment accuracy, RAxML tending to have an advantage on alignments that were highly accurate (easy model conditions or very good alignments on harder model conditions), and FastTree tending to have an advantage otherwise. Furthermore, although many of the differences were statistically significant (see
A comparison of running times shows dramatic differences among these three methods (
Runtimes of ML methods on other alignments are similar to runtimes on the ClustalW alignment (data not shown). The 1000-taxon model conditions are arranged along each x-axis from left to right in order of increasing difficulty. Standard error bars are shown.
We studied performance on ten ribosomal RNA datasets with 117 to 27,643 sequences from CRW (the Comparative Ribosomal Website produced by Robin Gutell
For each of the datasets, we computed several alignments: the Quicktree and PartTree alignments only on the three largest datasets, and MAFFT, ClustalW, and SATé on the smaller biological datasets. We ran RAxML, RAxML-Limited, and FastTree on these alignments, and compared the resultant trees to the reference tree for each dataset.
With respect to ML score optimization, RAxML produced the best ML scores for all the alignment/dataset combinations (
Alignment | ML Method | 16S.B.ALL | 16S.T | 16S.3 |
TrueAln | RAxML |
|
|
|
FastTree |
|
|
|
|
RAxML-Limited |
|
|
|
|
SATé | RAxML |
|
|
|
FastTree |
|
|
|
|
RAxML-Limited |
|
|
|
|
MAFFT | RAxML |
|
|
|
FastTree |
|
|
|
|
RAxML-Limited |
|
|
|
|
PartTree | RAxML |
|
|
|
FastTree |
|
|
|
|
RAxML-Limited |
|
|
|
|
ClustalW | RAxML |
|
|
|
FastTree |
|
|
|
|
RAxML-Limited |
|
|
|
|
Quicktree | RAxML |
|
|
|
FastTree |
|
|
|
|
RAxML-Limited |
|
|
|
Some alignments were missing
Alignment | ML Method | 16S.M.aa_ag | 16S.M | 23S.M | 23S.M.aa_ag | 23S.E.aa_ag | 23S.E |
Reference | RAxML |
|
|
|
|
|
|
FastTree |
|
|
|
|
|
|
|
RAxML-Limited |
|
|
|
|
|
|
|
SATé | RAxML |
|
|
|
|
|
|
FastTree |
|
|
|
|
|
|
|
RAxML-Limited |
|
|
|
|
|
|
|
MAFFT | RAxML |
|
|
|
|
|
|
FastTree |
|
|
|
|
|
|
|
RAxML-Limited |
|
|
|
|
|
|
|
PartTree | RAxML |
|
|
|
|
|
|
FastTree |
|
|
|
|
|
|
|
RAxML-Limited |
|
|
|
|
|
|
|
ClustalW | RAxML |
|
|
|
|
|
|
FastTree |
|
|
|
|
|
|
|
RAxML-Limited |
|
|
|
|
|
|
|
Quicktree | RAxML |
|
|
|
|
|
|
FastTree |
|
|
|
|
|
|
|
RAxML-Limited |
|
|
|
|
|
|
ML scores given as log likelihoods;
Since the relative performance with respect to tree error to some extent depended upon the size of the datasets, we discuss results starting with the three largest datasets before discussing the smaller datasets. Since the reference tree for all the biological datasets is RAxML on the curated alignment, we expect RAxML to have lower tree error on the reference alignment than RAxML-Limited and FastTree.
On the largest dataset, 16S.B.ALL (27,643 sequences), only the Quicktree and PartTree alignment methods could be run
GTRGAMMA ML scores, missing branch rates, runtimes in hours, and alignment SP-FN errors are shown.
The next two largest datasets, 16S.3 and 16S.T, have 6323 and 7350 sequences, respectively, and represent comparable challenges. For these datasets we were able to obtain alignments from all five alignment methods, the sole exception being MAFFT on the 16S.3 dataset, which failed due to memory requirements on a machine with 32 GB of main memory.
RAxML produced more accurate trees than RAxML-Limited on all alignments, with differences ranging from small (about 1.5%) to large (about 6%) (
Missing branch rate (%) | |||||
Alignment | ML Method | 16S.B.ALL | 16S.T | 16S.3 | Average |
TrueAln | RAxML | 0.0 | 0.0 | 0.0 | 0.0 |
FastTree | 3.9 | 2.8 | 3.2 | 3.3 | |
RAxML-Limited | 13.8 | 5.5 | 6.1 | 8.4 | |
SATé | RAxML | n.d. | 7.5 | 6.8 | n.a. |
FastTree | n.d. | 8.2 | 7.7 | n.a. | |
RAxML-Limited | n.d. | 11.0 | 8.4 | n.a. | |
MAFFT | RAxML | n.d. | 7.3 | n.d. | n.a. |
FastTree | n.d. | 8.2 | n.d. | n.a. | |
RAxML-Limited | n.d. | 8.9 | n.d. | n.a. | |
PartTree | RAxML | 31.8 | 17.1 | 12.0 | 20.3 |
FastTree | 29.1 | 16.3 | 12.5 | 19.3 | |
RAxML-Limited | 38.4 | 18.6 | 15.4 | 24.1 | |
ClustalW | RAxML | n.d. | 9.7 | 9.9 | n.a. |
FastTree | n.d. | 10.5 | 10.4 | n.a. | |
RAxML-Limited | n.d. | 12.9 | 13.3 | n.a. | |
Quicktree | RAxML | 13.2 | 33.9 | 31.8 | 26.3 |
FastTree | 13.5 | 33.9 | 32.5 | 26.6 | |
RAxML-Limited | 21.8 | 35.0 | 35.6 | 30.8 |
Alignment SP-FN error (%) | |||||
Alignment | 16S.B.ALL | 16S.T | 16S.3 | ||
SATé | n.d. | 37.0 | 24.9 | 30.9 | |
MAFFT | n.d. | 31.0 | n.d. | 31.0 | |
Quicktree | 54.4 | 63.0 | 52.8 | 56.7 | |
ClustalW | n.d. | 56.3 | 52.0 | 54.2 | |
PartTree | 41.7 | 34.3 | 22.6 | 32.9 |
Alignment | ML Method | 16S.B.ALL | 16S.T | 16S.3 | Average |
TrueAln | RAxML | 647.3 | 305.3 | 322.1 | 424.9 |
FastTree | 5.2 | 1.0 | 1.1 | 2.4 | |
RAxML-Limited | 10.3 | 3.7 | 3.1 | 5.7 | |
SATé | RAxML | n.d. | 123.6 | 123.1 | n.a. |
FastTree | n.d. | 1.7 | 0.8 | n.a. | |
RAxML-Limited | n.d. | 2.7 | 1.0 | n.a. | |
MAFFT | RAxML | n.d. | 188.3 | n.d. | n.a. |
FastTree | n.d. | 1.3 | n.d. | n.a. | |
RAxML-Limited | n.d. | 1.4 | n.d. | n.a. | |
PartTree | RAxML | 1418.1 | 176.2 | 118.3 | 570.9 |
FastTree | 6.3 | 4.1 | 2.4 | 4.3 | |
RAxML-Limited | 50.3 | 5.1 | 2.9 | 19.4 | |
ClustalW | RAxML | n.d. | 73.0 | 64.3 | n.a. |
FastTree | n.d. | 0.7 | 0.6 | n.a. | |
RAxML-Limited | n.d. | 1.0 | 0.8 | n.a. | |
Quicktree | RAxML | 2149.9 | 247.3 | 120.7 | 839.3 |
FastTree | 2.1 | 0.9 | 0.7 | 1.2 | |
RAxML-Limited | 33.3 | 1.6 | 1.3 | 12.1 |
“Average” refers to the average across datasets.
As before, RAxML produced more accurate trees than RAxML-Limited, and RAxML-Limited produced the least accurate trees (
Missing branch rate (%) | ||||||||
Alignment | ML Method | 16S.M.aa_ag | 16S.M | 23S.M | 23S.M.aa_ag | 23S.E.aa_ag | 23S.E | Average |
TrueAln | RAxML | 0.7 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.1 |
FastTree | 1.6 | 0.7 | 5.4 | 3.2 | 3.3 | 9.3 | 3.9 | |
RAxML-Limited | 3.7 | 3.8 | 11.9 | 5.1 | 12.1 | 10.7 | 7.9 | |
SATé | RAxML | 4.8 | 6.9 | 10.7 | 11.5 | 9.9 | 4.0 | 8.0 |
FastTree | 6.7 | 6.2 | 10.7 | 12.8 | 9.9 | 10.7 | 9.5 | |
RAxML-Limited | 9.0 | 9.3 | 14.3 | 15.4 | 12.1 | 13.3 | 12.2 | |
MAFFT | RAxML | 4.4 | 5.5 | 11.3 | 10.3 | 8.8 | 8.0 | 8.0 |
FastTree | 7.4 | 5.7 | 10.7 | 9.6 | 13.2 | 12.0 | 9.8 | |
RAxML-Limited | 8.8 | 6.2 | 16.7 | 13.5 | 13.2 | 20.0 | 13.0 | |
PartTree | RAxML | 12.7 | 8.8 | 22.0 | 19.9 | 14.3 | 6.7 | 14.1 |
FastTree | 12.9 | 8.6 | 22.6 | 18.6 | 17.6 | 12.0 | 15.4 | |
RAxML-Limited | 12.9 | 10.0 | 22.6 | 19.9 | 19.8 | 17.3 | 17.1 | |
ClustalW | RAxML | 13.9 | 11.2 | 16.7 | 16.0 | 22.0 | 17.3 | 16.2 |
FastTree | 12.7 | 8.6 | 16.7 | 14.7 | 23.1 | 20.0 | 16.0 | |
RAxML-Limited | 11.3 | 11.4 | 16.7 | 15.4 | 23.1 | 22.7 | 16.8 | |
Quicktree | RAxML | 10.4 | 10.2 | 20.8 | 19.9 | 28.6 | 24.0 | 19.0 |
FastTree | 11.5 | 9.3 | 19.6 | 17.9 | 18.7 | 20.0 | 16.2 | |
RAxML-Limited | 10.6 | 10.0 | 20.2 | 17.9 | 25.3 | 20.0 | 17.3 |
Alignment SP-FN error (%) | ||||||||
Alignment | 16S.M.aa_ag | 16S.M | 23S.M | 23S.M.aa_ag | 23S.E.aa_ag | 23S.E | Average | |
SATé | 22.7 | 22.0 | 29.3 | 28.4 | 22.2 | 21.2 | 24.3 | |
MAFFT | 22.6 | 21.8 | 28.6 | 28.3 | 19.5 | 18.5 | 23.2 | |
Quicktree | 37.5 | 41.0 | 48.4 | 43.8 | 26.6 | 28.1 | 37.6 | |
ClustalW | 38.2 | 42.6 | 46.2 | 47.6 | 30.0 | 38.5 | 40.5 | |
PartTree | 23.1 | 27.5 | 32.1 | 33.8 | 20.7 | 19.7 | 26.1 |
“Average” refers to the average across the six datasets.
Alignment | Method | 16S.M.aa_ag | 16S.M | 23S.M | 23S.M.aa_ag | 23S.E.aa_ag | 23S.E | Average |
Reference | RAxML | 7.54 | 5.90 | 2.33 | 2.25 | 0.99 | 0.78 | 3.30 |
FastTree | 0.08 | 0.06 | 0.04 | 0.03 | 0.02 | 0.02 | 0.04 | |
RAxML-Limited | 0.24 | 0.22 | 0.10 | 0.10 | 0.04 | 0.03 | 0.12 | |
SATé | RAxML | 6.26 | 4.51 | 1.57 | 1.37 | 0.67 | 0.62 | 2.50 |
FastTree | 0.14 | 0.06 | 0.03 | 0.03 | 0.02 | 0.02 | 0.05 | |
RAxML-Limited | 0.21 | 0.18 | 0.07 | 0.07 | 0.03 | 0.03 | 0.10 | |
MAFFT | RAxML | 4.27 | 3.99 | 1.56 | 2.28 | 0.88 | 0.53 | 2.25 |
FastTree | 0.07 | 0.06 | 0.03 | 0.03 | 0.02 | 0.01 | 0.04 | |
RAxML-Limited | 0.16 | 0.16 | 0.08 | 0.29 | 0.03 | 0.03 | 0.12 | |
PartTree | RAxML | 10.05 | 7.19 | 2.71 | 1.86 | 0.85 | 0.76 | 3.90 |
FastTree | 0.16 | 0.11 | 0.05 | 0.08 | 0.02 | 0.02 | 0.07 | |
RAxML-Limited | 0.33 | 0.28 | 0.09 | 0.08 | 0.03 | 0.03 | 0.14 | |
ClustalW | RAxML | 5.99 | 3.73 | 1.64 | 2.10 | 0.66 | 0.51 | 2.44 |
FastTree | 0.05 | 0.04 | 0.02 | 0.02 | 0.01 | 0.01 | 0.03 | |
RAxML-Limited | 0.14 | 0.11 | 0.06 | 0.06 | 0.02 | 0.02 | 0.07 | |
Quicktree | RAxML | 5.46 | 3.37 | 1.48 | 1.38 | 0.90 | 0.58 | 2.19 |
FastTree | 0.05 | 0.04 | 0.02 | 0.02 | 0.01 | 0.01 | 0.03 | |
RAxML-Limited | 0.19 | 0.12 | 0.05 | 0.05 | 0.03 | 0.02 | 0.08 |
“Average” refers to the average across the datasets.
The study showed that RAxML produced better ML scores than both FastTree and RAxML-Limited, and topologically more accurate trees than RAxML-Limited, in almost all cases. However, the relative performance of FastTree and RAxML depended upon the alignment and dataset, so that RAxML typically produced slightly more accurate trees than FastTree on the large datasets.
We now compare our study to Price
Our results are in agreement with
Our study examined the relative performance of two variants of RAxML and FastTree on nucleotide datasets, including several very large biological datasets (one with almost 28,000 sequences) and simulated datasets with 1000 sequences. The results of our study establish the following. First, RAxML clearly produces better ML scores compared to RAxML-Limited and FastTree, and topologically more accurate trees than RAxML-Limited, in almost all cases. When used with highly accurate alignments, RAxML also tends to produce topologically more accurate trees than FastTree, but the differences tend to be small on large datasets. When used with less accurate alignments (such as might be estimated on very large datasets, on which the most accurate alignment methods cannot run
Our study is limited to nucleotide sequences, and therefore the relative performance between RAxML and FastTree could be different on amino-acid sequences. Furthermore, phylogenetic ML methods provide estimations of branch lengths and other numeric model parameters, and it is possible that the improved ML scores obtained by RAxML reflect improved estimations in these other model parameters. For applications such as detecting selection in which the other model parameters are important, improved model parameter estimation, such as might be enabled by using RAxML, may be necessary.
Future studies should investigate whether RAxML produces improved estimations of other model parameters, and the impact of these improved estimations. It would also be beneficial to evaluate whether RAxML and FastTree differ when amino-acid alignments must be estimated, since Price
All datasets used in this study are previously published, and are available (along with the reference tree and alignment) at
Multiple sequence alignments were produced using MAFFT (using its L-INS-i and PartTree algorithms) version 6.240
MAFFT L-ins-i default:
mafft –localpair –maxiterate 1000
–quiet <input>><output>
MAFFT PartTree:
mafft –parttree –retree 2
–partsize 1000 <input>><output>
ClustalW default:
clustalw2 -align -infile = <input>
-outfile = <output> -output = fasta
ClustalW Quicktree:
clustalw2 -align -infile = <input>
-outfile = <output> -output = fasta
-quicktree
SATé :
./sate_basic.pl -r <name of run>
-w <empty temporary work directory with full path>
-d <input unaligned sequences file with full path>
-l 1 -s 1 -a 5
To perform ML analyses, we used RAxML version 7.2.6 and FastTree version 2.1.3. The following commands were used to run these programs:
RAxML (and RAxML-Limited):
raxmlHPC -m GTRCAT -w <work dir>
-n <identifying suffix> -s <input> -j
FastTree:
FastTree -nt -gtr -nosupport
-log <log file> <input alignment>><output tree>
Where necessary, we parallelized the RAxML analyses by either recompiling with PTHREADS and using the flag -T <number of threads> or by recompiling with MPI; the parallelization did not otherwise affect the RAxML commands. RAxML's outputs were unaffected by parallelization, and all reported runtimes are for serialized execution.
For all datasets except the three largest biological datasets, SATé and two-phase analyses were performed using a heterogeneous Condor
To run SATé and the two-phase methods on the 16S.T and 16S.3 datasets, we used a 64-bit computing cluster at the University of Texas at Austin, consisting of machines with 8-core 2.83 GHz Intel Xeon 64-bit CPUs with 32 GB main memory per CPU. Due to the memory requirements of SATé and the two-phase methods on the 16S.B.ALL dataset, we used two machines with very large shared memory, each having a 16-core 64-bit AMD Opteron CPU running at 2.5 GHz, and with either 128 GB or 256 GB main memory.
(EPS)
(TIF)
(TIF)
(TIF)
(TIF)
(TIF)
(TIF)
(TIF)
The authors acknowledge the Texas Advanced Computing Center (TACC) at The University of Texas at Austin for providing HPC resources that have contributed to the research results reported within this paper.