Wednesday, April 10, 2013

Processing Microarray data by using R and Bioinductor

Load all the .CEL.gz file from a folder:

>library(affy)
>data <- ReadAffy()

## For Affymetrix data, there is no concept of RAW data.
See page 72 of DNA microarray data analysis using Bioconductor.

Now get the RMA data
> data.rma <- rma(data)

or you can use the following if the size of data is too large.

>data.rma <- justRMA(data)


change the RMA result to expression values
data.e <- exprs(data.rma)

save the data.e
write.csv(data.e,file="data.csv")

##now process the data and map probe ID to gene ID
##by any script language ....such as python


now load the processed data to R again:
d <- read.table("processed_data.csv",header=T,sep=",")

now d is a string matrix, make it to float

df <- data.frame(d,row.names=1)    ## use the first column as name

calculate the mean and append to df

df$mean <- rowMeans(df)

ranking

dfr <- df[order(-df$mean),]

save the highest 60 to a .csv file

write.csv(dfs[1:60,],file="d60.csv",sep=",")


No comments: