Efficient calculation of pairwise phylogenetic distance matrices.
The easiest installation method is through Conda. If you choose to install via PyPI ensure that you have a Rust compiler.
pip install phylodm
conda install -c bioconda phylodm
A phylogenetic distance matrix (PhyloDM
) object can be created from a newick file:
```python from phylodm import PhyloDM
with open('/tmp/newick.tree', 'w') as fh: fh.write('(A:4,(B:3,C:4):1);')
pdm = PhyloDM() pdm.loadfromnewick_path('/tmp/newick.tree') ```
The dm
method generates a symmetrical numpy distance matrix and returns a tuple of
keys in the matrix row/column order:
```python from phylodm import PhyloDM
with open('/tmp/newick.tree', 'w') as fh: fh.write('(A:4,(B:3,C:4):1);')
pdm = PhyloDM.loadfromnewick_path('/tmp/newick.tree')
import dendropy tree = dendropy.Tree.getfrompath('/tmp/newick.tree', schema='newick') pdm = PhyloDM.loadfromdendropy(tree)
dm = pdm.dm(norm=False) labels = pdm.taxa()
""" /------------[4]------------ A + | /---------[3]--------- B ---[1]---+ ------------[4]------------- C
labels = ('A', 'B', 'C') dm = [[0. 8. 9.] [8. 0. 7.] [9. 7. 0.]] """ ```
If true, the data will be returned as normalised by the sum of all edges in the tree.
Tests were executed using the scripts/phylodm_perf.py
script with 10 trials.
These tests demonstrate that PhyloDM is more efficient than DendroPy's phylogenetic distance matrix when there are over 500 taxa in the tree. If there are less than 500 taxa, then use DendroPy for all of the great features it provides.
With 10,000 taxa in the tree, each program uses approximately: * PhyloDM = 4 seconds / 2 GB memory * DendroPy = 17 minutes / 90 GB memory
2.0.0
- Re-write in Rust (2x faster)
1.3.1
- Use OpenMP to parallelize PDM methods.
1.3.0
- Removed tqdm.
- get_matrix() is now 3x faster.
1.2.0
- Addded the remove_keys command.
1.1.0
- Significant improvement in PDM construction time using C.
1.0.0
- Initial release.
Please cite this software if you use it in your work.