Building MBF v2.0 Preview

Jan 26, 2011 at 1:33 PM

Greetings,

I downloaded the 2.0 preview source code, compiled it, and have a few questions:

1.  When I set the SequenceAssembler as the startup project and hit F5 to debug I get the following error: XamlParseException was unhandled.  'The invocation of the constructor on type 'SequenceAssembler.ConsensusCustomView' that matches the specified binding constraints threw an exception.' Line number '259' and line position '95'.  This is in SequenceAssembly.xaml.

2.  The contents of the directory SourceSamples\ReadGenerator are empty

3.  Would it be possible to get a description of what is new in 2.0?  The algorithms that changed (if any), the projects that were added?

4.  Would it be possible to get a description of what the samples are supposed to do?  Likely I am challenged by not being a biologist, maybe someone with more background in that field would know exactly what these are.

5.  It looks like there is very little activity on this board, what is the current direction of the MBF team?  Is there a roadmap?  I watched the 2010 video tutorial where the gentleman at the end commented about the difficult road Microsoft faces in developing this toolset (and getting the community to follow) and I was curious to hear what steps Microsoft is taking, or will take, to help build the community.

Thanks so much for your help and please let me know if there is additional information I can provide.

Jan 27, 2011 at 4:19 AM

I was able to get #1 working.  There was a problem reading the app.config file for Color data.  I'm still not entirely sure what the real problem was, there are articles that there is a bug in .net 4.0 as of August 2010 but if that were the case I wonder how this application ever worked?

And one more question...anyone know why the application is named SilverMap (Actually it is SequenceAssembler but for some reason I thought I saw silvermap and assumed silverlight)?  Or is the silvermap the name of the map after performing a BLAST?

Regardless the quick workaround is to read the contents of the app.config using this code in ConsensusCustomView.xaml.cs::ReadColorScheme 

var config = ConfigurationManager.OpenExeConfiguration(ConfigurationUserLevel .None);

ConfigurationSection handler = config.GetSection("Colors"

);

 

XmlDocument reader = new XmlDocument();

reader.LoadXml(handler.SectionInformation.GetRawXml());XmlNodeList colorSchemeNodes = reader.SelectNodes("//Colors/ColorScheme");

 

then use the code found in ColorSchemeConfigHandler.cs to do the rest of the work.

Jan 27, 2011 at 6:46 AM

As fas as I know SilverMap is a Silverlight  control to visualize BLAST results. Find more details here http://www.mquter.qut.edu.au/bio/Videos.aspx

Jan 27, 2011 at 7:34 AM
Edited Jan 27, 2011 at 7:35 AM

Hi Dourada,

Bitdisaster is right, the SilverMap control is a Silverlight visualization of BLAST results, and is built into the Sequence Assembler demo app.

I'll ask someone to look into why the ReadGenerator directory is empty, there should be a demo application in there that takes a DNA sequence and chops it up to generate a set of fragments similar to those that would be expected as input into a sequence assembly - basically it is a way of generating test datasets for PaDeNa, Smith-Waterman, etc.

As for what is new in 2.0 - currently most of the changes have concerned the object model and optimizations to performance - we still have a long way to go before we are ready for the full 2.0 release, which is scheduled for the summer. A list of planned features should be in the release notes - I'll check, and post something here if it is missing.

I'll write more later - but I would be interested in more background on you, Dourada. You say you are not a biologist, can I ask you what your interest is in MBF?

Thanks,

Simon

 

Jan 27, 2011 at 1:53 PM

Sure thing Simon,

The company I work for is interested in merging some bioinformatic tools/analysis into their product.  I am a computer science grad that worked in the software industry for the last 15 years and at the start of my career I became "assimilated", so to speak, with Microsoft technologies.  There has been the odd venture into linux/unix systems but the bulk of my work has consisted of developing with the full Microsoft stack.  I have a personal interest in biology/bioinformatics but I am certainly not a trained biologist.  The more I read the more I understand I have no clue what I am doing but I enjoy it nonetheless.

That said, I read about MBF last summer, downloaded it, but was not able to apply it to anything.  Now fortunately, that has changed.

My first visit, when researching a way to implement this type of analysis, led me to:  http://seqanswers.com/.  If you have been thinking of developing a community you have no doubt visited the site.  It seems to be a somewhat thriving community.  However, it kills me the amount of time they spend downloading/installing/compiling/configuring tools in a linux/unix environment.  Granted some might find it enjoyable to work at this level, and that's totally cool, but if I were a biologist I would simply expect this toolset to be available and functional.  In my opinion that is what MBF offers, it really is a beautiful thing, in one package, that I can immediately integrate into .net code.  It's the ultimate in leverage from a person that utilizes Microsoft development tools.

I spent a couple of weeks attempting to download/install/compile/configure the tools necessary to perform some bioinformatic analysis, bwa, maq, samtools, etc.   It's certainly doable but a waste of time, does not integrate well into a .net application infrastructure, and if a researcher is doing it that's an additional waste (in my opinion) as they should be focused on their research not on learning how to manage files with a perl script (unless it provides personal satisfaction to work at this level).

I encourage you to read through this thread if you have not already:  http://seqanswers.com/forums/showthread.php?t=4589.  My favorite part is when they start discussing better ways to create a directory structure with perl.  It reminds me of software developers talking about their favorite keyboard shortcuts.  Again, these are not disparaging comments, I do not know any of the posters to this thread or belittle their effort, in fact it is a great tutorial for people starting out and I appreciated reading it very much....but seriously.....it seems like an incredible amount of work when "all" the gentleman wanted to do was analyze some sequence data.

Anyway, so that's where we are at.  I'm not sure if we will use MBF in the future or not but I can definitely see how it is a useful product.  I do not know how you get more .net developers into academic labs or how you change the behavior of those that enjoy linux/unix machines.  I can tell you that if a .net developer needs to integrate bioinformatic tools in their application MBF is a no brainer.

I am happy to help and see a great future but as you said in the video (I think it was you) if you cannot create a community around this there is no reason to move forward with the project.  It would be interesting to know how long it typically takes for a community to form around a product.  Maybe that time period has not elapsed?

Please let me know if you would like more information.

Thanks,

--dan

 

 

Jan 27, 2011 at 1:59 PM

Oh, thanks also for your responses, I really appreciate it.

One more quick question.....if silvermap is a silverlight control does that mean we can compile MBF into a silverlight application now?  Or does it still require the full .net 4 framework?  I suspect it is the latter and that silvermap does not make use of MBF functionality but thought I would ask.

Jan 27, 2011 at 5:51 PM

Unfortunatelly you can't compile MBF for Silverlight. To many missing API's. But if you doing Siverlight and wanna talk to remote database you have go the WCF route anyway. So MBF is for Silverlight apps server side only.

Dan, your story is pretty intersting because it's the same as mine. I'm computer scientist grad as well and got stucked with bioinformatics when wrote my diploma thesis about machine learning for diagnostic systems. The frontend was a ASP.net 1.1 site with SQL Server 2000 as data store. I was the first one who brought the "enemy" to the bioinformatics group. But over the years I could win more fellows for the .net world. But in general one is kind of lonely as Microsoft guy in the bioinformatics world.  However, I was happy when I heard about MBF and I hope the project keeps moving forward.  Maybe we could chat? ping me at janDOThannemannAThotmailDOTde

jan

Coordinator
Jan 27, 2011 at 11:17 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.
Coordinator
Jan 27, 2011 at 11:18 PM
This discussion has been copied to a work item. Click here to go to the work item and continue the discussion.
Coordinator
Jan 27, 2011 at 11:44 PM

Hey dourada -

Thanks so much for your input!  I've taken the liberty of creating two new work items based on the two bugs you found (#1 and #2). 

For #2 it would be great if you would be willing to provide a patch with the fix as you already found - thereby becoming a part of the MBF dev community right off the bat!:)

For #1, we'll have to deal with this one ourself as somehow this code must have been inadvertently missed during our contribution.

For #3, there is a high-level description of changes regarding the DevPreview for V2 on the download page itself.  For the rest of the v2 effort, I'll defer to Beatriz and Simon for clarity on specifics.  In general, these are our top areas of investment for V2:

- Large Genome Assembly:  in V1 MBF was intentionally capacity constrained to onboard memory.  Though we had made some significant progress in the direction of de novo assembly, pairwise and multiple alignment techniques, data virtualization was implemented in only a few cases.  Our primary goal at this time is to ensure that genomes on the order of human size can be assembled on a machine of reasonable configuration. 

- Genome Visualization:  We are investing in a toolset which leverages both the MBF library as well as other Windows-based technologies to help in the area of genome visualization.  A prototype of our efforts can be found here: [url:  http://131.107.151.123/#/GenoZoom_v1_1].  You can view the details of how this prototype was created in this video:  [url:  http://research.microsoft.com/apps/video/default.aspx?id=137042].

- Analysis and Annotation:  We would like to provide the addition of a few important capabilities that we were not able to get to in the V1 effort.  Namely, we want to ensure that editing and annotation of sequence data is supported, as well as support for paired-end reads, custom extensions to the BAM format, and optimizations to the sequence data reprensentation that will allow for tools related to statistical anaysis of gene mutations.

- Developer Support:  We strive to make the library easier to understand, easier to debug, easier to deploy and easier to consume.  We are evaluating a whole host of things that could be done here, one of which is to support Silverlight devleopment. 

We can't make any guarantees at this point, but hopefully this will give you a good idea of our general direction.  We are compeltely open to community contributions, and would encourage anyone interested in promoting their own ideas by providing the patches/changesets necessary to improve the platform.

For #4, there is some very thorough documentation regarding some of the sample apps (sequence assembler and the Biology Extension for Excel), the others are relatively small and contain a readme to help the user get started.  The best way to get your head wrapped around this stuff is to go ahead and install the V2 Dev Preview and exectue the tools.  Some, like the Workflow samples, will requie other tools to use.  If you have specific questions we can try to provide more detail where needed.

For #5, I'm sure Simon and/or Beatriz will address your question here (which is a good one!)

Hope this helps!

Mike

Jan 28, 2011 at 3:02 AM

Yes Mike, that is a wonderful help.  I appreciate all of the information.  GenoZoom is very cool.  I do not see the source code in the MBF2.0 release will you be releasing the code under the MBF?

Jan 28, 2011 at 8:11 AM

We are looking at releasing the GenoZoom source, but there are some administrative hurdles we have to clear first - shouldn't be long now, though.

Regarding the building of community around MBF, this is certainly our plan - Microsoft is not full of biologists as you might imagine, and so we need community participation to make sure the project remains valuable to the biological research community. While academia is naturally a target, the license terms of MBF are such that it is also freely available to the commercial world - and as you know, the Microsoft stack is much more prevalent there.

We certainly haven't solved the issue of community-building, and while we have taken some steps in that direction, we are still learning which approaches work most effectively. Some examples:

  • We have appointed a Technical Advisory Board for MBF, comprising academic and commercial adopters (Illumina, J&J Pharmaceutical R&D, Aditi technologies, Cornell University and Queensland University of Technology currently). We have suggestions for features of course, but this TAB collectively sign off on the final plans and will steer the direction of the project in the future.
  • We are exploring transfer of ownership of the MBF project to a non-profit foundation - we will remain as significant contributors to the project, but this has always been intended to be community-owned and community-curated, and this move will further demonstrate our commitment to that goal.
  • Many of us in the team have been presenting MBF at scientific meetings and to companies for months niow, and that schedule of presentations will continue whenever we have something new to say (for example our de Bruijn graph-based de Novo assembly algorithm was just presented at the 2nd International Conference on Bioinformatics Models, Methods and Algorithms in Rome a couple of days ago).
  • We have also been building a web presence - aside from Codeplex you can take a look at http://research.microsoft.com/bio, where you will find further discussion forums, more downolads, videos, tutorials, etc. For reasons of CodePlex policy, some of our downloads have to live on that site instead.
  • The first of our training courses will run in Redmond on March 11th (the course is free and you would be welcome to attend) and we hope to run at least another three this year

Overall though, I would characterize uptake as a promising start and no more; we are trying to evaluate the impact of our community-building efforts and trying new things all the time.

Finally - you speculate on how long it takes to build a community. I am sure it depends on many things, but it is my expectation that in the case of MBF we cannot expect overnight success and we will certainly be taking a leading role in driving this forward for the foreseeable future. Community is a virtuous cycle - the more participants we have, the broader and more relevant the MBF feature set - which in turn attracts more community participants. I think v2 will certainly boost usage once it is released in July.

 

Jan 31, 2011 at 12:17 AM

Thanks Simon, that's great information, I really appreciate it.

Mike, I would be happy to make the code change to fix the problem I saw but I suspect it is not a bug and a configuration problem on my side.  Unless for some odd reason the code was compiled against .net 3.5 and was moved to .net 4.0 recently, but that would be strange as well because MBF requires .net 4.0.  Have you been able to reproduce the problem?  If so please let me know and I'll submit the fix.

Coordinator
Feb 1, 2011 at 3:12 AM
dourada wrote:

Yes Mike, that is a wonderful help.  I appreciate all of the information.  GenoZoom is very cool.  I do not see the source code in the MBF2.0 release will you be releasing the code under the MBF?

 TBD on source code release for GenoZoom.  We are evaluting options here, but for now it's a demo prototype only.

Coordinator
Feb 1, 2011 at 3:17 AM
dourada wrote:

Thanks Simon, that's great information, I really appreciate it.

Mike, I would be happy to make the code change to fix the problem I saw but I suspect it is not a bug and a configuration problem on my side.  Unless for some odd reason the code was compiled against .net 3.5 and was moved to .net 4.0 recently, but that would be strange as well because MBF requires .net 4.0.  Have you been able to reproduce the problem?  If so please let me know and I'll submit the fix.

OK - we'll look into this over the next available Sprint cycle.  If you can provide the details of your configuration, including details about the machine and the software dependency versions loaded, it would help greatly in understading your experience.  Feel free to update the bug entered on this with the deatils so that it gets tracked appropriately.  If we are able to find a solution the bug should be trackable by you as well.

Developer
Feb 1, 2011 at 11:03 PM
dourada wrote:

if silvermap is a silverlight control does that mean we can compile MBF into a silverlight application now?  Or does it still require the full .net 4 framework? 

 Our friends at Queensland University of Technology ported a subset of MBF to compile against the Silverlight runtime. Look at 38:30 in this presentation http://research.microsoft.com/apps/video/default.aspx?id=142236 from James Hogan. As he explains the parts that were left out were mostly those that need the parallel extensions in 4.0. If you must have this for your project, I can put you in touch with the QUT folks.

Developer
Feb 1, 2011 at 11:06 PM

Dourada and Bitdisaster, you may be interested in attending our upcoming one-day MBF workshop on March 11 in Redmond, WA. This workshop will include a quick introduction to Visual Studio 2010, the .NET Framework, C#, and the MBF object model. Attendees will participate in hands-on labs and write a sample application that employs the file parsers, algorithms, and web connectors in MBF. The workshop is open to everyone and registration is free of charge. For complete details and the registration link please visit: http://research.microsoft.com/en-us/events/mbf2011/. If you have any questions don't hesitate to also contact me.

Developer
Feb 1, 2011 at 11:10 PM
dourada wrote:

" It looks like there is very little activity on this board"

Last reply of the day, I promise, hehe. For a while the MBF project was moving away from Codeplex, at which point we founded a new community forum at http://www.getsatisfaction.com/mbi. We are now accepting Patches via Codeplex, and therefore open to questions, suggestions, etc in both forums. Cheers! Bea. 

 

Apr 30, 2011 at 9:05 PM

Greetings,

After downloading the latest MBF 2.0 beta source code (mbf-76446) on a different computer and witnessing the same problem with another developer we planned to fix the problem.  Turns out it's not a bug at all.....it was downloading the .zip file on windows 7 (will probably happen on other OSs too?), extracting the contents, and receiving the: 'this file came from another computer and might be blocked to help protect this computer'.  The sequence assembler was not able to read from the app.config file.

After unblocking everything worked as expected.  Not knowing if this was going to occur in other places a quick search shows how to unblock multiple files with a tool from the technet site:  http://technet.microsoft.com/en-us/sysinternals/bb897440.aspx.  Run streams.exe on the source code folder before you compile and I think that will prevent the problem from happening.  If not do a build, open command prompt, change directory to the binaries\debug (or release) folder and run this:  streams.exe -d *.*.  That seems to work like a charm.

I do not really know what streams.exe does or if there is a better way to solve this problem but I'm doing this on a development machine.  If you are doing this on something more important you might want to exercise a little more caution?

Anyway, hope this helps someone.

Thanks!

 

 

Apr 30, 2011 at 9:30 PM

You can unblock the zip file before extracting and then all extracted files become unblocked as well. Also extracting with WinRar should produce unblocked files since WinRar doesn't care about this attribute on the archive.