1
Vote

Review: Consider adding attached phylogenetic algorithms

description

This (stand-alone projects attached) adds a few simple things to the Bio.Phylogenetics namespace:
  • one simple data structure – DistanceMatrix
  • 3 classes of algorithms:
    • ISubstitutionModel for computing a DistanceMatrix from an aligned sequence
    • IHierarchicalClusteringAlgorithm for inferring a phylogenetic tree from a distance matrix
    • ITreeEvaluator for calculating a parsimony score for a tree given an aligned sequence
  • And the 3 corresponding standard simple implementations of those algorithms: JukesCantorSubstitutionModel, UpgmaClusteringAlgorithm, FitchTreeEvaluator
    Also included is a simple command-line sample that uses these.
     
    I’ve tried to modify my code to conform to your conventions, and include a lot of comments (with references to original papers, etc.). If you’d like to include this in MBF there are a few more things I should probably do:
  • Port my unit tests to your test framework for inclusion
    • I’m curious, by the way, why you use NUnit instead of VS unit testing (I’ll have to learn NUnit to port my tests from VS – no big deal, I’ve already started looking at it and it’s similar)
  • Localize exception messages (for now English is hard-coded – although I see MBF isn’t completely consistent about localizing them)
  • Perhaps move a couple utilities into existing MBF classes (eg. see TreeUtils.cs)
     
    See e-mail thread with Michael Zyskowski for more details

file attachments

comments

wrote Mar 23, 2010 at 9:09 PM

We'd like to move forward with this for M9.  Need to assign someone from the MBF team to assist in code reviews and integration of this external contribution.  Submitted via Rick Byers.

zyskowski wrote May 17, 2010 at 5:50 PM

We have decided to push the inclusion of this work into just beyond the v1 release. This is primarily for stability reasons, as we are very much in favor of including this work in the library.