1
Vote

Getting original sequence names and IDs in aligned sequences

description

Greetings.
I have a couple of issues I would like to discuss ( I am not sure if these are bug or unfinished items - a roadmap of your progress in this regards will be useful). Sorry for the long description.
 
FIRST: I noticed that both in SequenceAssembler and the Bio.dll that I am using to implement aligners to do sequence alignment, the original sequence names and IDs are missing. Consequently, I cannot tell which original sequence corresponds to which aligned sequence. An example will make this clear, and help you understand the issue (I am using NUCmer aligner here):
 
Original sequences named: Seq-1, Seq-2, Seq-3, Seq-4
The result is:
Alignments:
 Alignment1
      Sequence1
      Sequence2
 Alignment2
      Sequence1
      Sequence2
 Alignment3
      Sequence1
      Sequence2
 
How can I tell for sure, without looking at the actual aligned sequences, that Sequence1 in the result corresponds to Seq-1 in the original sequence list? Or Sequence2 in the result of aligned sequences corresponds to Seq-2, Seq-3 or Seq-4 of the input sequences, and in that order? Is it guarenteed that Sequence1 corresponds to Seq-1, and that Sequence2 corresponds to Seq-2, Seq-3 and Seq-4 respectively in the aligned sequences? Is the order in the returned aligned sequences preserved and guarenteed?
 
If I use PAMSAM, again original sequence names (Seq-1, Seq-2, Seq-3, Seq-4) are translated to Sequences1, Sequences2, Sequences3 and Sequences4. Is the order preserved in the aligned sequences?
Why not update ID in the aligned sequences to the original ID?
 
SECOND: The Metadata object of the aligned sequences is not being updated, at least for PAMSAM, consequently, there is no staistics available from PAMSAM. Is there a way to obtain these statistics? An example, using AlignedSequences will be helpful.
 
THIRD: I need Consensus sequences for PAMSAM alinment data. Is there a way to get this? I noticed that Metadata object is not populated. An example, using AlignedSequences will be helpful. For example, I get Consensus like:
MUMmer: Encoding.ASCII.GetString(alignedList[0].PairwiseAlignedSequences[0].Consensus.ToArray()); <--- from Consensus property
NUCmer: Encoding.ASCII.GetString(((Bio.Sequence)(alseq.Metadata["Consensus"])).ToArray()); <--- from Consensus item of Metadata dictionary
 
Your help in this regards will be highly appreciated.
KPOnCodeplex2011

comments

FadiF wrote May 19, 2011 at 4:59 PM

Hi KPonCodeplex2011

Thanks for reporting this, Aldo, one of our developers had a look at this, and this is his report back:

I see two issues here:
  1. Nucmer, Mumer and PAMSAM is not setting sequence ID on the output sequences.
    This looks like we missed out to update the ID on the output sequences while we ported Bio to new OM. According to current OM, ID has to be set explicitly. Earlier this was not the case.
  2. Sequence assembler is not showing sequence ID in the tree view, instead showing Seq1, Seq2 etc…
    This was designed and implemented so from day 1, but showing the ID seems to be more relevant than showing just ordered numbers..
He has a fix for both of these, once the fix passes code review, I will update you with the information.

Thanks again for reporting this, and hope you are enjoying using MBF.

Fadi Fakhouri, on behalf of The MBF Team

KPonCodeplex2011 wrote May 19, 2011 at 10:53 PM

Hi Fadi,
Thanks very much. Yes, I am enjoying using MBF library.
Your prompt response is greatly appreciated. Your changes will be really useful for us. Also, please check out the same on Smith-Waterman and Needleman aligners. Will you be able to include it in 2.0 release?

Any update on issue SECOND and THIRD above?
KPOnCodeplex2011

FadiF wrote May 20, 2011 at 9:24 AM

We are still looking into this, I will update you once we have an answer.
If the fix for above passes code review, then yes, I think this will be in the released bits of 2.0, and will be in the DevelopmentBranch of TFS... I will let you know as soon as I hear back from our developrs.

Fadi Fakhouri, on behalf of the MBF Team

FadiF wrote May 24, 2011 at 9:51 PM

Hi KPonCodeplex2011
Our developer confirmed that Item 1 and 2 you reported are fixed,
Regarding Issue 3 you report above....
"PAMSAM is not giving consensus:… PAMSAM’s output is SequenceAlignment but for MUMmer and NUCmer it is PairwiseSequenceAlignment. SequenceAlignment will contain only the aligned sequences and not the consensus.. Currently this is a limitation of PAMSAM.

For the fix, they will be part of the next release, or if you are a committer on TFS, you can sync to the latest and verify.

Please let me know if you need anything further, and thank you for using MBF!

Fadi Fakhouri on behalf of the MBF Team

FadiF wrote May 24, 2011 at 9:57 PM

Hi KPOnCodeplex2011

We confirm that Issue 1 and 2 are fixed, so if you have a an elisment on TFS, please update to the latest and verify.

Regarding the third issue: "PAMSAM is not giving consensus… "
PAMSAM’s output is SequenceAlignment but for MUMmer and NUCmer it is PairwiseSequenceAlignment. SequenceAlignment will contain only the aligned sequences and not the consensus.. Currently this is a limitation of PAMSAM.

I hope that resolves your questions... and thanks for using MBF.

Fadi Fakhouri, on behalf of the MBF Team

wrote Feb 13, 2013 at 7:43 PM