Skip to content

MeDIP-seq data analysis – diffReps

MeDIP-seq (and this should be the same for methylCap-seq/MBD-seq which I’m told is better) is advertised as reasonable cost-effective method to find differentially methylated regions (DMRs). 40 million reads or 1/5th of a lane on the HiSeq is an adequate amount of sequencing depth. I was even told by the same guy as above that he found pretty much the same set of DMRs using methylCap-seq as he did methylC-seq.

About two years ago now, I did some meDIP-seq on some cancer cell lines. The data looked good. The one locus (KRT7 promoter/CpG island) we thought had a high chance of being a DMR between our two groups of cell lines was clearly differentially methylated. CpG islands that you would assume would be unmethylated (e.g. GAPDH) had no signal while loci that should be methylated showed high signal. But the problem was finding DMRs between our two classes of cell lines. I tried MEDIPS and MACS but they do not handle replicates natively and there was a lot of inter-condition variation, which just resulted in a lot of noise. I also saw that some minor changes in peak shapes could result in DMRs being called. Then I tried using HTseq-count to map reads to CpG islands, promoters, H3K4me1 peaks, etc. and then using DESeq to do the statistics. This worked a little better in that it did find a couple DMRs and it did return the KRT7 CpG island as a DMR and a couple others but it was a targeted approach to a small fraction of the genome.

After sitting on the data for longer then anyone should sit on data (about a year), a new ChIP-seq analysis package called diffReps was released in July of 2013 that handles replicates natively. Busy with my new job making TALEs and doing whole genome bisulfite sequencing, for months it was one of those things I kept on thinking I had to try. Finally, sometime towards the end of last year, I gave it a try and it work magnificently. It found a nice set of 2656 DMRs. Looking at them in the genome browser they were all nicely repeated across the our replicates and DMRs were preferentially found near our set of differentially expressed genes. So we finally got that piece of data we really needed and now the manuscript is just about finished.

So after doing meDIP-seq, my experience is you really need replicates. Our system had a lot of inherent noise but I think it’s a method that needs replicates regardless. I guess all experiments need replicates but some more then others. Also, it is definitely not a full genome-wide method. Areas with low CpG density do not get pulled down regardless of methylation status. I guess in theory you would eventually get coverage of low CpG areas if you sequenced deep enough but that would defeat the cost saving advantage of the method over whole genome bisulfite sequencing. And of course the resolution is not very good. You’re only going to see differentially methylated regions and not differentially methylated cytosines. But it gave us the data we needed. You can do a lot of samples without blowing the whole lab budget. And a of course a final thanks to the authors of diffReps for coming up and sharing a method that can handle the data.

ADAM22_ucsc

BsaI – Worst restriction enzyme ever

This is public service announcement. BsaI is the worst restriction enzyme ever. Never use it or any protocol that suggests you use it. I know that includes just about every TALE construction protocol, but really save yourself a bunch of pain and suffering and don’t use BsaI.

If you’re in a lab that is making TALE construction platforms, please just stop using this enzyme. Ask yourself why you insist on publishing protocols using this enzyme when there are better options. Your lab must know this enzyme doesn’t work. You have already caused endless hours of wasted time to graduate students and postdocs around the world. Have a quick read of one of the TALE Google groups and it’s an endless stream of failure caused by this horrible enzyme.

To anyone reviewing a methods paper that uses BsaI, please tell the authors that they must rid the protocol of this enzyme before publication. Seriously, no paper should pass review that uses this enzyme.

Some good comments on ChIP-seq library prep

Well I suspected the Illumina ChIP-seq kit wasn’t very good.  Well here is what someone else had to say about it and some other good comments on making ChIP-seq library preps.

Thanks Luke!!!

Ethan,

Just thought I would let you know my experiences with the new TruSeq kit after our discussions. I have spent the last week or so playing with it. Everything now works great but there are few issues with the TruSeq ChIP kit and so here are my experiences:

 

1)      The SYBR Gold gel protocol is madness!!! This dye, when added in to the gel (as opposed to post-gel staining) completely and utterly screws up the way the gel runs. I should have known this, but have always used EtBr and so don’t know an awful lot about the SYBR stains. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX I’m amazed they ever got it to work. It’s impossible to size select accurately in my hands.

2)      Your pre-gel PCR is pure genius. I do 5 cycles as you suggest and it works a treat. I can usually see a faint smear from about 300 to 700 bp. I have tried both Phusion and Kapa and I agree that Kapa has the edge.

3)      There is minimal danger of selecting out mono DNA after ligation using the Ampure XP beads in my hands. When you think about it, a 1:1 ratio of beads to sample comfortably leaves anything over 200 bp and adapted mono-DNA is closer to 300 bp at that point. The lowest MW DNA I can see on the gel is a shade under 300 bp, which corresponds nicely to mono-DNA. Nevertheless I may add additional PEG as you suggest at this stage just in case.

4)      I only do a single post-ligation Ampure XP clean-up – primarily for speed – and it works fine, but also to reduce the risk of losing LMW DNA. I have never seen adapter contamination using this approach, at least when combined with pre-gel PCR.

5)      I use Qiagen for all steps except post-ligation and post-enrichment (for cleanliness). Qiagen post-ligation left a ton of adapters in the final library and is clearly a no-no. This increases speed with little effect on yield.

6)      You cannot dilute the Illumina indexes. I only tested 1:10 and 1:50 dilutions and got no library at all in the 1:50 and barely anything in the 1:10. It’s feasible that a 1:2 to 1:5 dilution could work, but this is unnecessary really given the quality of the final library using neat indexes.

7)      The number of enrichment cycles varies but for safety I find around 10 post-gel cycles gives a very nice amount of library at around 10- 15 ng/ul when starting out with less than 10 ng. I have gone as low as 8 cycles and it usually works fine, but the final concentration can get a bit low. I can usually tell how many I will need depending on if I can see anything on the gel or not.

 

Hope this helps in case anyone else asks your advice on this. Thanks for your tips.

 

Luke

Luke Norton, Ph.D.

Gibson Assembly Master Mix

It’s not the most exciting post but it may be useful to some. I’ve spent most of my time making plasmids and assembling TALEs since I’ve arrived in Australia. Most of the plasmids I’ve made with Gibson Assembly, which I must say is totally awesome. For sticking a single gene/ORF into an expression vector with two restriction sites, old-school cloning is still the way to go since you don’t have to PCR the entire plasmid.  But for anything more complicated, Gibson Assembly makes it a lot easier. I would suggest sequencing all the important parts of the plasmid after making a plasmid with Gibson Assembly.  Some weird stuff and PCR induced mutations can occur.

As far as the Master Mix, you can buy it from NEB or you can save money and buy all the components separately and save some money. It didn’t really make sense to me the way others were doing it so I recalculated everything and am doing it as described here:  Gibson Assembly Master Mix.  It worked better then the NEB master mix, but I have no idea why.

As far as primer design, this is how it goes for me:

1) I assembly the vector in silico using everyVector, which is a totally awesome tool (lots of totally awesome stuff in a boring post).

2) I make about a 44-mer oligo that span the fragment junctions and try to start it and end it in a G or C.  If there are a lot of A/Ts I make them longer, a lot of G/Cs I make them shorter.

3) Take the reverse-compliment of the oligo from step 2.

4) Trim about 12 bp off the 5′ ends of the oligos from step 2 and 3 so that the oligos are about 32 bp long.  This way I end up with 22 bp of complimentary sequence for the PCR and 20 bp of overlap between fragment for the Gibson Assembly reaction.

Not very scientific but it’s quick and works.

TALEs, TALENs and ICA (iterative capped assembly)

So I’ve spent the last months working on/with TALEs (transcription activator-like effectors).  (Epi)Genome is going to be the sport of the future (like kickboxing – 80′s movie reference).  Anyway, there are a bunch of TALE assembly platforms out there.  I decided to go with ICA because of the flexibility of incorporating any RVD.  The protocol works but it appears many have struggled with getting it to work.  The paper doesn’t go into the technical details very much or highlight the critical steps so I thought it would be helpful to some to post a detailed version of the protocol exactly the way I am doing it.  One thing I will note here is that production of the monomers, while it is simply a PCR and restriction digest is not trivial.  I had difficulty getting good PCR products and more importantly BsmBI is not a good enzyme.  You will have to use more enzyme and cut longer then you think.  You may even have to gel purify away the uncut/partially cut monomers.  I’m also using a lot less monomer than what Adrian Briggs is using in his protocol.  This is nice as it saves you from the time and cost of making new batches of monomers.  I would also suggest doing the cloning part of the protocol the old fashion way (linearize and phosphatase treat the vector) and not doing a Golden Gate reaction.  I’ve read far too many posts of people struggling with Golden Gate cloning of TALEs.  Once it is working then, you could try to go with Golden Gate.

On another note – I probably should have posted this protocol a few months ago as it appears that there is a much easier technology coming on-line.  CRISPR

But for those still interested in making TALEs the protocol is here: TALE construction protocol – Iterative Capped Assembly

What AMPure is not good for

So I’ve been pretty excited about AMPure since I learned you could order the beads without the PEG for a fraction of the cost and I’ve been eliminating QIAquick and phenol-chloroform extractions/ETHANol precipitations from my molecular biology vocabulary.  But it appears there is a big limitation to AMPure.  With QIAquick and phenol-extractions, the proteins are denatured then purified away by chromatography or organic partitioning respectively.  Qiagen Buffer PB has 5M guanidine hydrochloride.  Whereas with AMPure the proteins are only purified away by not precipitating on the beads (AMPure buffer is 1.25 M NaCl and PEG).  So in some cases where trace enzymatic activity carryover can be an issue, AMPure may be problematic.

I found such a case.  I was making the monomers for ICA TALE assembly and used AMPure.  The assembly failed.  What this entailed was PCR amplifying a 100 bp DNA fragment, cutting it with BsmBI, which cuts at the ends, and ligating the fragments together.  If you look at the gel below you can see the monomers (100 bp DNA fragments) PCR amplified and then cut with BsmBI.  The ones purified with QIAquick look nice and crisp but the one purified with AMPure look a little degraded.  What I think is happening is some carryover polymerase from the PCR reaction is trying to blunt the ends.  A couple days work and a significant amount of reagents down the drain but a good lesson learned.  Thought I’d try to save some other from the same mistake.

Now the question is, if you make AMPure beads with guanidine hydrochloride, will it function like QIAquick?  I gave it a quick little go and the problem is getting everything into solution, but I think initial data looks promising.  How long until some company patents it?

Ribo-Zero rRNA removal for RNA-seq with FFPE samples

We (which of course means someone else did most of the work and that person is Andrija Matak), made RNA-seq libraries from some FFPE tumor samples, which had been sitting in the biobank at the Medical Univeristy of Graz in Austria for 6 years.  We used the Ribo-Zero magnetic rRNA removal kit to deplete the samples of rRNA.  The whole experiment worked surprisingly well.  All the samples had less then 10% rRNA reads and most were down around 2 %.  A couple of the samples had very poor alignment (less then 20% of reads) and had to be discarded from the analysis.  The percentage of reads aligned in general was significantly lower (around 45 to 60%) then what I’ve been seeing with high quality RNA (usually above 70%).  It also seems like less of the aligned reads were being assigned to genes by HTseq-count.  But with all that being said the experiment went very well.  The data matches the data from a bunch of genes we (again not meaning me) have checked by in situ hybridization.  The variability within each tumor region was remarkably low.

Take a look at the DESeq output. Pretty clean.  I was a little surprised.

BTW, that is the QC report that ezDESeq.sh makes for you.  All you have to do is make two folders with your fastq files and run the script.  It’s that easy.

Finally, here was my adaptation of the Ribo-Zero protocol and how I pipe it into the TruSeq RNA protocol.  If I were to do it again, I would divide all the Ribo-Zero volumes by four.  Here I started with 1 μg of RNA and only had to do 9 cycles of PCR amplification, so clearly there is some room for savings here.  One other thing, the official Ribo-Zero protocol says you should ethanol precipitate, which is way too old-school.  They also say you can use AMPure RNA beads.  I guess they don’t realize this but AMPure beads precipitate all nucleic acids, so you can just use regular AMPure XP beads (or better yet homemade) so there is no reason to buy another expensive reagent to precipitate your RNA.

ETHANomics NGS data analysis suite

I updated a bunch of my scripts and they are working on Linux and Mac now.  Nothing innovative as I am not a statistician or informatician, but there are a couple nice easy to use scripts that link together the standard steps of RNA-seq and ChIP-seq primary data analysis.  So if anyone is looking for a quick easy way to run DESeq or MACS without having to push all the files though the pipeline one by one, take a look.  All you have to do is throw your fastq files in and type GO.  Well, not exactly but almost that easy.  There are some other useful scripts and files in there as well.  Glad to help anyone who wants to get them running.  You will find them located at the “scripts” tab above.

It’s all detailed here: http://ethanomics.files.wordpress.com/2012/09/ethanomicsngs.pdf

TruSeq ChIP kit from Illumina

It appears that Illumina has finally released a TruSeq compatible kit for ChIP.  Looking at the protocol I am not very impressed.  They do a double AMPure purification like they do in the TruSeq RNA kit to get rid of the self-ligated adapter dimers and then a gel purification. From my experience there is a real risk of losing mononuclesomal DNA from native ChIP and the smaller fragments from X-linked ChIP when doing a AMPure size selection.  It makes me wonder if Bioo Scientific patented my 4 cycle PCR method to convert the Y-shaped adapters to double-stranded DNA before the gel purification step.  I’m sure Illumina’s protocol works for larger DNA fragments, say greater than 200 bp, but a lot of ChIPs have DNA fragmentation much shorter than that.  My advice for now is, unless your fragments are longer the 200 bp, stick with my protocol (best option) or use the Bioo Scientific kit.

Homemade AMPure XP beads

Certainly not my protocol and the guy that wrote it says it certainly isn’t his either. But it’s about time someone put this together and we could start saving a lot of money.

Anyway, the protocol was nicely written in plain English by these guys (thanks!!!!):

B. Faircloth & T. Glenn, November 19, 2011, Ecol. and Evol. Biology, Univ. of California – Los Angeles

From this publication: Cost-effective, high-throughput DNA sequencing libraries for multiplexed target capture

Which of course came from some previous publications..see the protocol.  It is here: Homemade AMPure XP beads

 

And a big thanks from the community to the above contributors!!!!!

Follow

Get every new post delivered to your Inbox.

Join 37 other followers