Accurate Prediction of cancer outcome helps choosing the best therapy for patients. This site provides access to over 25 gene expression data sets for 13 types of cancer. Try single genes or combinations, assess their predictive value and see whether you can beat the authors of data sets.


If you use the datasets for validation of your methods please cite: Roy J., Isik Z., Winter C. and Schroeder M., Network information improves cancer outcome prediction. Brief Bioinform. 2014 Jul;15(4):612-25. doi: 10.1093/bib/bbs083

Download all sets (1.1 GB)

Author Year Cancer Var. Patients Download Paper
Jones et al. 2005 renal cell S 69
Spira et al. 2007 lung D 129
Lee et al. 2010 bladder P 165
Steidl et al. 2010 lymphoma T 130
Bhojwani et al. 2008 leukemia T 82
Bhojwani et al. 2008 leukemia P 59
Bogunovic et al. 2009 melanoma P 38
Raponi et al. 2006 lung P 129
van de Vijver et al. 2002 breast P 295
Wang et al. 2005 breast P 276
Korkola et al. 2009 germ cell P 82
O"Donnell et al. 2005 oral cavity P 27
Friedman et al. 2009 leukemia P 68
Landemaine et al. 2008 breast P 23
Nanni et al. 2006 prostate D 30
Nanni et al. 2006 prostate T 20
Iqbal et al. 2010 lymphoma D 80
Fernandez et al. 2010 lymphoma S 22
Frank et al. 2006 leukemia T 41
Smith et al. 2010 colon P 55
Lenz et al. 2008 lymphoma T 414
Mok et al. 2009 ovarian P 53
Dressman et al. 2006 breast S 37
Murat et al. 2008 glioblastoma T 70
Zhu et al. 2010 lung P 133
Winter et al. 2012 pancreas P 30
Pedraza et al. 2009 breast D 58
Le Dieu et al. 2009 leukemia D 41
Badea et al. 2008 pancreas D 78
Mortensen et al. 2015 prostate D 50

clinical vairables DDiagnosis
T Treatment Response/Outcome

These datasets were obtained on the 1th of April running the following query in PubMed.

(cancer[tiab] OR neoplas*[tiab] OR tumor[tiab]) AND
humans[MESH] AND
gene expression[tiab] AND
2002:2011[pdat] AND
(marker[tiab] OR biomarker[tiab] OR signature[tiab]) AND
(predict*[tiab] OR diagnos*[tiab]) AND
(survival[tiab] OR outcome[tiab] OR progression [tiab] OR response[tiab] OR metastasis[tiab] OR behavior[tiab]) NOT (networks[tiab] OR pathway[tiab] OR review[pt] OR Tissue Array Analysis[MeSH Terms] OR Protein Array Analysis[MeSH Terms]).

The resulting publications were filtered for impact factor greater than 5 (by Thomson Reuters (ISI) Web of Knowledge, version 2009) and if the data was obtained using either Affymetrix HGU133plus2 or HGU133A chips. If raw microarray data was available, we used the RMA implementation as provided in the affy R package, which automatically performs background correction and summarization and then transforms the expression values into log2 scale.

Set Cancer Best marker Accuracy
loading ...

Single-gene marker approved by FDA

Name/Symbol Cancer Link to NCBI
Alpha-fetoprotein liver cancer, germ cell tumours, ovarian cancer
Bladder tumor antigen bladder cancer
CA 125 epithelial ovarian cancer
Carcinoembryonic antigen (CEA) colorectal cancer, lung cancer, breast cancer
Estrogen receptor (ER) breast cancer
Nuclear Mitotic Apparatus protein (NUMA1) bladder cancer
PSA prostate cancer

Best single gene marker per data set

Set Cancer Name/Symbol Link to NCBI Accuracy

Signatures published by data set authors

Set Cancer Symbols Accuracy