Thursday, April 11, 2013

Process NIH GEO GSE data by geoQuery

How to get an Expression value table of a GSE* file from GEO website.

for example: GSE33147

>g = getGEO("GSE33147")

......

it may download a series data matrix file: GSE33147_series_matrix.txt.gz
then load the dat again:
>g = getGEO(filename="GSE33147_series_matrix.txt.gz")

check the data

>class(g)

get the ExpressionSet:

> e = as(g, "ExpressionSet")

get the data table
> f = exprs(e)

save:
> write.csv(f, file="***")

load the group gene names that you want: (assume you only want part of them)
the names are stored in file "top60.csv"
>genes = read.csv("top60.csv",header=T)

the genes are factors, we need change them to character,
> cgenes = as.character(genes[,1])               //the first column.

>

No comments: