Source code and data files to regenerate table and figures from S. B. English, A. J. Butte. Evaluation and Integration of 49 Genome-wide Experiments and the Prediction of Previously Unknown Obesity-related Genes. Bioinformatics, 2007. 17921495
This TAR/GZIP file contains:
distribute.R: an R program that regenerates the figures and table from the publication. Has been tested in R version 2.3.1. Requires libraries: ROCR, vioplot, R2HTML.
pairwise_GS_test_den.txt, pairwise_GS_test_num.txt, pairwise_test_den.txt, pairwise_test_num.txt, obesity_alltests.txt: data files used by the distribute.R program. Each contains the sensitivity and precision of each individual study and each pair-wise intersection of studies, as computed by database queries. These are directly recomputable from the gene.table.Rdata file below, but were precomputed separately for convenience.
homologene.table.Rdata: an R version of the Homologene table of relations. Provides a single data frame homologene.table which has 70,699 rows and 4 columns. Each row contains a Homologene family ID (hid), NCBI Gene ID, NCBI Taxonomy ID, and gene symbol.
gene.table.Rdata: an R version of a table of each gene in each study. Provides a single data frame gene.table which has 645,400 rows and 4 columns. Each row contains a study identifier (mappable to the identifiers used in Supplemental Methods using the distribute.R program), flag (three choices: hid.missing, meaning that the gene was not measured in the study; negative, meaning the gene was measured and was not significant in the study; or positive, meaning the gene was measured and was significant in the study), NCBI Gene ID, and Homologene family HID (hid).
Unpack the file on a Linux system using the command tar xvfz english_butte_code.tar.gz. After running distribute.R, six output files are created:
figure_1a.pdf: matches figure 1a in the manuscript.
figure_1b.pdf: matches figure 1b in the manuscript.
figure_2.pdf: matches figure 2 in the manuscript.
figure_3.pdf: matches figure 3 in the manuscript.
extra figure.pdf: demonstrates the statistically significant difference in the number of positive experiments between genes in the gold standard versus non-gold standard genes.
table_1.html: matches table 1 in the manuscript, but with all 52 genes positive in 5 or more of the 49 experiments.