Source code and data files to regenerate table and figures

Source code and data files to regenerate table and figures from S. B. English, A. J. Butte. Evaluation and Integration of 49 Genome-wide Experiments and the Prediction of Previously Unknown Obesity-related Genes. Bioinformatics, 2007. 17921495

This TAR/GZIP file contains:

  • distribute.R: an R program that regenerates the figures and table from the publication. Has been tested in R version 2.3.1. Requires libraries: ROCR, vioplot, R2HTML.
  • pairwise_GS_test_den.txt, pairwise_GS_test_num.txt, pairwise_test_den.txt, pairwise_test_num.txt, obesity_alltests.txt: data files used by the distribute.R program. Each contains the sensitivity and precision of each individual study and each pair-wise intersection of studies, as computed by database queries. These are directly recomputable from the gene.table.Rdata file below, but were precomputed separately for convenience.
  • homologene.table.Rdata: an R version of the Homologene table of relations. Provides a single data frame homologene.table which has 70,699 rows and 4 columns. Each row contains a Homologene family ID (hid), NCBI Gene ID, NCBI Taxonomy ID, and gene symbol.
  • gene.table.Rdata: an R version of a table of each gene in each study. Provides a single data frame gene.table which has 645,400 rows and 4 columns. Each row contains a study identifier (mappable to the identifiers used in Supplemental Methods using the distribute.R program), flag (three choices: hid.missing, meaning that the gene was not measured in the study; negative, meaning the gene was measured and was not significant in the study; or positive, meaning the gene was measured and was significant in the study), NCBI Gene ID, and Homologene family HID (hid).

Unpack the file on a Linux system using the command tar xvfz english_butte_code.tar.gz. After running distribute.R, six output files are created:

  • figure_1a.pdf: matches figure 1a in the manuscript.
  • figure_1b.pdf: matches figure 1b in the manuscript.
  • figure_2.pdf: matches figure 2 in the manuscript.
  • figure_3.pdf: matches figure 3 in the manuscript.
  • extra figure.pdf: demonstrates the statistically significant difference in the number of positive experiments between genes in the gold standard versus non-gold standard genes.
  • table_1.html: matches table 1 in the manuscript, but with all 52 genes positive in 5 or more of the 49 experiments.
 
public/obesityintegration.txt · Last modified: 2008/04/09 17:04 (external edit)
 
Except where otherwise noted, content on this wiki is licensed under the following license:CC Attribution-No Derivative Works 3.0 Unported
Recent changes RSS feed Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki