|
Hi,
The NUCmer aligner in MBF uses different names for some of the MUMmer/NUCmer parameters. Why didn't you use the same parameter names in MBF as in NUCmer/MUMmer?
In addition, some of these parameters don't seem to be exposed as public properties at all, or at least not yet.
For example, at the link below there's an excellent set of instructions from the MUMmer developers for using NUCmer to align short DNA sequence reads to a genome:
How to use MUMmer to align short reads to the human genome:
http://www.cbcb.umd.edu/research/mummer_reads.shtml
Unfortunately, you can't set the following two parameters in MBF's NUCmer, which as the instructions in the link mentioned above indicate, are important for aligning short DNA sequence reads to a genome:
--maxmatch
Necessary; otherwise legitimate hits may be missed
--nooptimize
This encourages nucmer to extend the alignment all the way to the end of the read; without this, nucmer may fail to include ends where a substitution occurs close to an end
Looking at the MBF source code, it seems straightforward to hard code the "--maxmatch" parameter, i.e., in your own custom version of "Bio.dll". In the "NUCmer.cs" source code file you just need to change:
isUniqueInReference = true
to
isUniqueInReference = false
at two points in the source code (see below)
***********************************************************************
NUCmer.cs (excerpts from the original, in build 78593)
ln 211-215:
/// <param name="isUniqueInReference">flag to indicate that the matches should be unique in reference.</param>
/// <returns>Returns clusters.</returns>
public IList<Cluster> GetClusters(
ISequence querySequence,
bool isUniqueInReference = true)
ln 277-281:
/// <param name="isUniqueInReference">Whether MUMs are unique in query or not.</param>
/// <returns>List of enumerable of delta alignments.</returns>
public IEnumerable<DeltaAlignment> GetDeltaAlignments(
ISequence querySequence,
bool isUniqueInReference = true)
***********************************************************************
However, I'm not exactly sure how to set the "--nooptimize" flag in the source code (I don't mind compiling my own derivative of Bio.dll with the "--maxmatch" and "--nooptimize" flags set).
At the following NUCmer.cs lines:
715
781
790
803
898
915
There is a statement (where the tooltip message is shown as a trailing comment)
methodName |= ModifiedSmithWaterman.OptimalFlag; // Maximise the alignment score
My guess is that to set the "--nooptimize" flag, at one or more of these lines the value used should be as follows:
methodName |= ModifiedSmithWaterman.SeqendFlag; // Align till end of shortest sequence
But I'm not sure which lines I should change.
So, if a developer needs to set the "--nooptimize" flag in the source code and compile their own derivative of Bio.dll with the "--maxmatch" and "--nooptimize" flags set for NUCmer, could you please describe exactly how
that should be accomplished?
Thanks in advance for your help!
Robert
|