NUCmer parameters "--maxmatch" and "--nooptimize"

Jun 23, 2011 at 6:58 PM

Hi,

The NUCmer aligner in MBF uses different names for some of the MUMmer/NUCmer parameters.  Why didn't you use the same parameter names in MBF as in NUCmer/MUMmer?

In addition, some of these parameters don't seem to be exposed as public properties at all, or at least not yet.

For example, at the link below there's an excellent set of instructions from the MUMmer developers for using NUCmer to align short DNA sequence reads to a genome:

How to use MUMmer to align short reads to the human genome: http://www.cbcb.umd.edu/research/mummer_reads.shtml

Unfortunately, you can't set the following two parameters in MBF's NUCmer, which as the instructions in the link mentioned above indicate, are important for aligning short DNA sequence reads to a genome:

--maxmatch
 Necessary; otherwise legitimate hits may be missed
--nooptimize
 This encourages nucmer to extend the alignment all the way to the end of the read; without this, nucmer may fail to include ends where a substitution occurs close to an end

Looking at the MBF source code, it seems straightforward to hard code the "--maxmatch" parameter, i.e., in your own custom version of "Bio.dll".  In the "NUCmer.cs" source code file you just need to change:

isUniqueInReference = true

to

isUniqueInReference = false

at two points in the source code (see below)

***********************************************************************

NUCmer.cs (excerpts from the original, in build 78593)

ln 211-215:

/// <param name="isUniqueInReference">flag to indicate that the matches should be unique in reference.</param>
/// <returns>Returns clusters.</returns>
public IList<Cluster> GetClusters(
 ISequence querySequence,
 bool isUniqueInReference = true)

ln 277-281:

/// <param name="isUniqueInReference">Whether MUMs are unique in query or not.</param>
/// <returns>List of enumerable of delta alignments.</returns>
public IEnumerable<DeltaAlignment> GetDeltaAlignments(
 ISequence querySequence,
 bool isUniqueInReference = true)

***********************************************************************

However, I'm not exactly sure how to set the "--nooptimize" flag in the source code (I don't mind compiling my own derivative of Bio.dll with the "--maxmatch" and "--nooptimize" flags set).

At the following NUCmer.cs lines:

715
781
790
803
898
915

There is a statement (where the tooltip message is shown as a trailing comment)

methodName |= ModifiedSmithWaterman.OptimalFlag; // Maximise the alignment score

My guess is that to set the "--nooptimize" flag, at one or more of these lines the value used should be as follows:

methodName |= ModifiedSmithWaterman.SeqendFlag;  // Align till end of shortest sequence

But I'm not sure which lines I should change.

So, if a developer needs to set the "--nooptimize" flag in the source code and compile their own derivative of Bio.dll with the "--maxmatch" and "--nooptimize" flags set for NUCmer, could you please describe exactly how that should be accomplished?

Thanks in advance for your help!

Robert

 


 

Jun 24, 2011 at 4:47 PM

Sending for Bob, who is on vacation:

 

Hi Robert,

You should be able to use the -maxmatch flag to nucmerutil.exe.  (I think the short form is -x)  You can look at the code in nucmerutil if you want to see how it is used to get the library to do what you want.  I found the use of a default parameter and setting it to true can be a bit confusing, so be careful there.

 

I don’t know the situation on –noopt off the top of my head, and I am on vacation right now, but I’ll see what I can dig up when I get back if it has not been addressed before then.

 

-bobd-