How to: Create a CDF file from an R package/environment

Author: Mark Robinson (pruned by Henrik Bengtsson)
Created on: 2009-01-15
Last updated: 2011-01-11

First of all, thanks are due to Samuel Wuest for some of the legwork on these procedures.

For expression arrays, Bioconductor makes available XXXcdf and XXXprobe packages for many of the Affymetrix chips (where XXX is the chip name).  However for the new generation of chips (e.g. HuGene-1_0-st-v1), this is no longer done.  If you find yourself needing to make a CDF file in order to use aroma.affymetrix and there are R packages available (either from Bioconductor or ones that can be made with pdInfoBuilder), you may find one of the following two approaches useful.

In order to run these, you will need to have the R package installed as well as a CEL data file for the chip you wish to make the CDF file.

Creating CDF file from R metadata package

Below is an example session for converting the "hgu133plus2cdf" R/Bioconductor package into a CDF file (hgu133plus2.cdf in this case).

library("aroma.affymetrix");
env2Cdf("hgu133plus2cdf", "u1332plus_ivt_breast_A.CEL", overwrite=TRUE);

In this example, we have a CDF file that can be downloaded from Affymetrix.  Some quick code to verify that the CDF file from env2Cdf() captures the same information as stored:

library("affxparser");
x <- readCdf("hgu133plus2.cdf");     # created above
y <- readCdf("HG-U133_Plus_2.cdf");  # from Affymetrix (binary-converted)

g <- intersect(names(x), names(y));
m <- match(g, names(x));
x <- x[m];
m <- match(g, names(y));
y <- y[m];

checkUnit <- function(xx,yy) {
  a <- xx$groups[[1]];
  b <- yy$groups[[1]];
  all(a$x == b$x & a$y == b$y & a$atom == b$atom & (a$pbase==a$tbase) == (b$pbase==b$tbase));
}

total <- 0;
for (ii in 1:length(m)) {
  total <- total + checkUnit(x[[ii]],y[[ii]]);
}

stopifnot(total == length(m));  # if TRUE, then same info is being represented

Creating CDF file from R package built from pdInfoBuilder

Below is an example session for converting the "pd.hugene.1.0.st.v1" package (i.e. for the HuGene-1_0-st-v1 chip type) created using pdInfoBuilder package.  To create an R package using pdInfoBuilder in the first place, you need to download the library files, probe.tab files, and NetAffx Annotation files from Affymetrix (Human Gene 1.0 ST).  Then use the commands:

library("pdInfoBuilder");
pgfFile = "HuGene-1_0-st-v1.r3.pgf";
clfFile = "HuGene-1_0-st-v1.r3.clf";
probeFile = "HuGene-1_0-st-v1.probe.tab";
transFile = "HuGene-1_0-st-v1.na27.hg18.transcript.csv";
pkg <- new("AffyGenePDInfoPkgSeed",
          version="0.0.1",
          author="Mark Robinson", email="mrobinson@...",
          biocViews="AnnotationData",
          genomebuild="hg18",
          pgfFile=pgfFile, clfFile=clfFile,
          probeFile=probeFile, transFile=transFile);
makePdInfoPackage(pkg, destDir=".");

Then you will need to install the package to your R session, by:

R CMD INSTALL pd.hugene.1.0.st.v1

The following commands (given a CEL data file from that chip) will build a CDF file, which you can deposit in the correct directory to use with aroma.affymetrix:

library("aroma.affymetrix");
library("pd.hugene.1.0.st.v1");
pathname <- writeCdf(pd.hugene.1.0.st.v1, tags="pd.hugene.1.0.st.v1,HB20110111", overwrite=TRUE);
print(pathname);
## [1] annotationData/chipTypes/HuGene-1_0-st-v1/HuGene-1_0-st-v1,pd.hugene.1.0.st.v1,HB20110111.cdf

Similar to above, you can verify this CDF file against the CDF file you can download from the HuGene-1_0-st-v1 page.