Venn Diagram based on Genomic Intervals issue

Jul 17, 2011 at 4:10 AM

Hello Everyone,

I have results from two ChipSeq experiments.  

Experiment A has 1900 peaks and Experiment B has 4000 peaks.

When I create Venn Diagram using the interval data, the total number of resulting regions numbers almost 10000.  This is much higher than the sum of the two groups.

This occurs because certain interval from the two Experiments are being split.  

So if interval 1 is 10-100 and interval 2 is 50-150, this results in 3 separate genomic regions  (0-50 (A only), 50-100 (A and B), and 100-150 (B only)).

Is there a way to control this behavior. For instance, I do not want these intervals to be split into three, but treated as one.  So even if there is only one base pair overlap, it should be called A and B.  

Otherwise, 5900 peaks end up in 10,000 genomic regions which is hard to explain to biologists.

Any help is appreciated on this matter.

Vivek

Coordinator
Jul 18, 2011 at 9:52 AM

Hi Vivek,

It has been a while since I looked at the interval or VennDiagram code, so I am not sure exactly what you are trying to do and the steps you are using to do it. 
It sounds like you want to join the two intervals A and B together to create the 'A and B' interval using the IntersectOutputType::OverlappingIntervals rather than the OverlappingPiecesOfIntervals modifier for the SequenceRangeGrouping in the CreateSequenceRangeGroupingForVennDiagram in VennToNodeXL.cs.

If you are writing C# and using the library to produce the Venn diagram, you might be able to clone a bit of the code into your source and set the IntersetOutputType for what you want to do.  If you are doing it in excel, the plug-in and/or the library will have to be modified.

If you have a set of small sample files and the steps you are taking, we could arrange some discuss what you want to and how to do it.

-bobd-

Jul 18, 2011 at 5:55 PM
Thank you for the reply.

I am trying to do the following.

I have two ChipSeq results. These consist of interval data, chromosome, start, end, peak location and peak height.

I want to see the number of peaks that overlap in these two datasets and the number that is unique to each set.

Exp A has 2000 peaks and Exp B has 5000 peaks.

The Excel Addin for Venn Diagram is creating more than 9000 'regions' from these two data. This is incorrect for my purposes (because I only have 7000 total peaks).

I can send you some of my data as well as the analysis I have done so far.

Please tell me how to send the data.

Best,
Vivek