GenBank Parser: Incorrect parse when qualifier contains embedded '/' at start of a line.


Self-contained archive containing solution and data to reproduce this bug can be downloaded at (5.5MB)
When parsing a “/note” qualifier, if there are multiple lines of note text, sometimes a line break will happen such that a "/" which is part of the text will start a continuation line. This causes the parser to treat the following text on that line as a qualifier and any following text lines as values for the qualifier. This happened in MBF v1 and still happens in v2.
An example of this is in
misc_feature 948729..949922
                /note="2-polyprenyl-6-methoxyphenol hydroxylase and
                related FAD-dependent oxidoreductases [Coenzyme metabolism
                / Energy production and conversion]; Region: UbiH;
The parser identifies the following qualifiers for this feature. Observe that the /note has been truncated and a spurious qualifier has been introduced based on the tail of the /note.
Qualifier: { Energy production and conversion]; Region: UbiH;}
Value : { COG0654"}
Qualifier: {db_xref}
Value : {"CDD:30999"}
Qualifier: {gene}
Value : {"kmo"}
Qualifier: {locus_tag}
Value : {"AM1_0975"}
Qualifier: {note}
Value : {"2-polyprenyl-6-methoxyphenol hydroxylase and\nrelated FAD-dependent oxidoreductases [Coenzyme metabolism}


FadiF wrote May 23, 2011 at 5:49 PM

Hi lbuckingham,

Thanks for this, I have forwaded this information to our Testers to try to reproduce this, and we will get back to you very shortly.

Fadi Fakhouri
on behalf of the MBF team

FadiF wrote May 23, 2011 at 6:38 PM

Hi lbuckingham,
We confirm that we can reproduce this. We will Look at possible solutions, It is currently being tracked in our TFS (If you are a committer, please view item TFS work item 31432).
I will respond to you once we have a resolution / conclusion of our investigations.
Fadi Fakhouri
On Behalf of MBF Team

