Replication: gcRMA (background, normalization & summarization)

Author: Mark Robinson and Henrik Bengtsson
Created: 2009-05-17
Last modified: 2010-02-05

Description

This test verifies that aroma.affymetrix can reproduce the gcRMA (Wu et al. 2004)  chip-effect estimates as estimated by the gcrma package.

Data set

rawData/
  Affymetrix-HeartBrain/
   HG-U133_Plus_2/
    u1332plus_ivt_cerebellum_A.CEL [13555904 bytes]
    u1332plus_ivt_cerebellum_B.CEL [13550687 bytes]
    u1332plus_ivt_cerebellum_C.CEL [13551860 bytes]
    u1332plus_ivt_heart_A.CEL      [13554731 bytes]
    u1332plus_ivt_heart_B.CEL      [13553255 bytes]
    u1332plus_ivt_heart_C.CEL      [13551203 bytes]

Source: Affymetrix Tissue samples, 2007.  See Affymetrix data sets for chip type HG-U133_Plus_2.

Script

gcRMA estimates by aroma.affymetrix

library("aroma.affymetrix");
cs <- AffymetrixCelSet$byName("Affymetrix-HeartBrain", chipType="HG-U133_Plus_2");

# gcRMA background correction
bc <- GcRmaBackgroundCorrection(cs);
csB <- process(bc);

# RMA quantile normalization
qn <- QuantileNormalization(csB, typesToUpdate="pm");
csN <- process(qn);

# RMA probe summarization
plm <- RmaPlm(csN, flavor="oligo");
fit(plm);

# Extract chip effects on the log2 scale
ces <- getChipEffectSet(plm);
theta <- extractMatrix(ces);
rownames(theta) <- getUnitNames(cdf);
theta <- log2(theta);

gcRMA estimate by gcrma

library("gcrma");  # gcrma()
raw <- ReadAffy(filenames=getPathnames(cs));
es <- gcrma(raw, verbose=TRUE);
theta0 <- exprs(es);

Results

# Reorder the aroma.affymetrix estimates
o <- match(rownames(theta0), rownames(theta));
theta <- theta[o,];

# (a) Assert correlations
cors <- sapply(1:ncol(theta), FUN=function(cc) cor(theta[,cc], theta0[,cc]));
print(cors);
print(range(cors));
stopifnot(all(cors > 0.99995));

# (b) Assert differences
e <- (theta - theta0);
stopifnot(mean(as.vector(e^2)) < 1e-3);
stopifnot(sd(as.vector(e^2)) < 1e-3);
stopifnot(quantile(abs(e), 0.99) < 0.05);
stopifnot(max(abs(e)) < 0.085);

# (c) Visual comparison
layout(matrix(1:9, ncol=3, byrow=TRUE));
xlab <- expression(log[2](theta[gcrma]));
ylab <- expression(log[2](theta[aroma.affymetrix]));
for (kk in seq(length=ncol(theta))) {
  main <- colnames(theta)[kk];
  plot(theta0[,kk], theta[,kk], pch=".", xlab=xlab, ylab=ylab, main=main);
  abline(0,1, col="blue");
}
xlab <- expression(log[2](theta[aroma.affymetrix]/theta[gcrma]));
plotDensity(e, xlab=xlab); 

Figure: (Top six panels): Scatter plots comparing the chip-effect estimates (on the log2 scale) from aroma.affymetrix with the ones from gcrma.  (Bottom panel): The density of the log2-ratios between aroma.affymetrix and gcrma chip-effect estimates for the six arrays.

Appendix

Session information

R version 2.10.1 Patched (2010-01-12 r50990)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
 [1] hgu133plus2probe_2.5.0 AnnotationDbi_1.8.1    oligoClasses_1.8.0
 [4] hgu133plus2cdf_2.5.0   preprocessCore_1.8.0   gcrma_2.18.1
 [7] Biobase_2.6.1          aroma.affymetrix_1.4.3 aroma.apd_0.1.7
[10] affxparser_1.18.0      R.huge_0.2.0           aroma.core_1.4.4
[13] aroma.light_1.15.1     matrixStats_0.1.9      R.rsp_0.3.6
[16] R.filesets_0.7.4       digest_0.4.2           R.cache_0.2.0
[22] oligo_1.10.0           affyPLM_1.22.0         affy_1.24.2
[25] R.methodsS3_1.1.0

loaded via a namespace (and not attached):
[1] affyio_1.14.0      Biostrings_2.14.11 DBI_0.2-4          IRanges_1.4.11
[5] RSQLite_0.7-1      splines_2.10.1     tools_2.10.1i386-pc-mingw32

Redundancy test script

The above script is part of redundancy test suite executed at every new release: 

path <- system.file("testScripts/replication/chipTypes/HG-U133_Plus_2", package="aroma.affymetrix");
filename <- "gcrma.R";
pathname <- file.path(path, filename);
source(pathname);

References

Z. Wu, R. Irizarry, R. Gentleman, F.M. Murillo & F. Spencer, A Model Based Background Adjustment for Oligonucleotide Expression Arrays, JASA, 2004.