Skip to content

A quick and easy way to correlate ChIP-seq data with microarray expression data

August 6, 2011

O.k so you have a set of Affy probe ID from a microarray experiment as a single column list in a text file.   You also have some peaks from ChIP-seq data (as a BED file).  You want to see if there is overlap between these two data sets.  What can you do?  One way is to convert the Affy probe ID’s to a BED file using the Bioconductor R package biomaRT using the following script (obviously first you must ensure you have R and biomaRt loaded onto your computer).

> library(biomaRt)

> ensembl = useMart(“ensembl”)

> ensembl = useMart(“ensembl”, dataset = “hsapiens_gene_ensembl”)

> affyids <- read.delim(“mytextfile.txt”, header=F,stringsAsFactors=F)[,1]

> genelocs <- getBM(attributes = c(“chromosome_name”, “start_position”,

“end_position”, “strand”), filters = “affy_hg_u133_plus_2”, values = affyids, mart = ensembl)

#### you can also use “affy_hg_u133_plus_2”, “hgnc_symbol” to get affy probe set and gene symbol

> data.frame(genelocs)

> write.table(genelocs, file = “genelocs.bed”, append = FALSE, sep = “\t”)

 

Now your Affy probe sets have been converted into a BED file containing the gene coordinates of the transcript of the corresponding gene.  Now you can upload both files into Galaxy use the ‘get flanks’ function under ‘operate on genomic intervals’ to convert the transcript coordinates into promoter coordinates.  The use the ‘join’ sequences function (also under operate on genomic intervals tab) to find the places the two data sets overlap.

P.S.  R is a real pain.  No biologist should ever have to work in it.

Advertisements

From → Posts

2 Comments
  1. Interesting approach. There is also a new(ish) ChIP-Seq specific version of Galaxy now available, called Cistrome, which allows peaks in BED format to be mapped to nearby genes. These can then be integrated with a microarray data set.

  2. ethanomics permalink

    I tried Cistrome and I couldn’t get it to work. There was some incompatibility with my data that I couldn’t figure out. When I found it I was excited since it looked so easy. And I always go for the quick easy way first. But in the end, no luck.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: