I have had a goal for a long time to analyze ChIP-seq data from multiple experiments, and i have had trouble due to my poor understanding of statistics. I have recently taken several classes that vastly improve my practical understanding of subjects including the Data Analysis class on coursera, the Computing for Data analysis class, and the Experimental genome science class, all on coursera!! Currently, the data anlysis approach has been invaluable, and i looked at the performance of Normdiff scores for comparing experiments (Zheng et al. 2010).
I did this preliminary data analysis to compare two biological replicates. I used the wiggle files produced using MACS that represent the number of reads that overlap each genome position. Then i calculated the NormDiff using a short R script to get the background subtracted and normalized scores.
After this, I analyzed the effectiveness of the normalization by creating an “MA plot”. MA graphs the results of two experiments by using the log product vs log ratio, or alternatively, using the added effect vs the difference of two experiments (which is what I used). The idea is that if the normalization is good, the differences or log ratios should not be biased.
As it turns out, the normalization is very good and the best fit line almost flat (slope 0.05, 95% confidence interval +- 0.001). One of the cool things of the NormDiff algorithm is that it estimates the variance from a sliding window in the data, and this appears to stabilize the normalization very effectively (see Figure 1).
Figure 1. MA plots showing line of best fit for S96 replicates. (a) NormDiff with local variance parameter (b) NormDiff with global variance parameter
I am fairly pleased with the results so I am making the data available and the R code so you can reproduce the results if you want!! Next, I want to look at the clustering and peak finding, and possibly evaluate other models of normalization