%\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{3. Methylation Arrays -- Lab} # Methylation Arrays -- Lab Epigenomics 2014
Author: Martin Morgan (mtmorgan@fhcrc.org)
Date: 24 August, 2014 ```{r, echo=FALSE} knitr::opts_chunk$set(cache=TRUE) ``` ```{r, echo=FALSE} suppressPackageStartupMessages({ require(minfi) require(minfiData) }) ``` This case study takes a quick tour through the [minfi](http://bioconductor.org/packages/release/bioc/html/minfi.html) Bioconductor package. The main goal is to become familiar with the use of Bioconductor objects and methods; see the [vignette](http://bioconductor.org/packages/release/bioc/vignettes/epivizr/inst/doc/minfi.pdf) accompanying the minfi package for more background on analysis of Illumina arrays for methylation analysis. Start by attaching the `minfi` and `minfiData` packages. Use `browseVignettes("minfi")` to access the vignette for additional background information. ```{r, eval=FALSE} require(minfi) require(minfiData) browseVignettes("minfi") ``` The first step in any work flow is to read in the data. A sample data set is provided at the following location. ```{r} baseDir <- system.file("extdata", package = "minfiData") baseDir dir(baseDir) dir(file.path(baseDir, "5723646052")) ``` Of course your own data would be at another location, and you might enter the path (with 'tab completion') instead of using `system.file()`. A typical organization is that each 'slide' (containing 12 arrays) is stored in a separate directory. The top-level directory contains a `.csv` file describing the samples; inside each slide directory are IDAT files representing the output of the Illumina scanner. Next read in the sample sheet, and then the raw probe-level data. Take a moment to use R to explore the sample sheet. Read the 'man' page for `read.450k.sheet` and `read.450k.exp` to see what options are available. ```{r} ## 'pData' targets <- read.450k.sheet(baseDir) head(targets) ## 'raw' probe-level data RGset <- read.450k.exp(base = baseDir, targets = targets) ``` As a basic quality assessment, visualize the distribution of beta values across each array, coloring the density functions by sample. Are there any concerns about the data? ```{r} ## Basic QA -- comparable densities across samples? densityPlot(RGset, sampGroups = RGset$Sample_Group, main = "Beta", xlab = "Beta") ``` A _technical artifact_ is that probe intensities differ depending on their sequence composition, so it is necessary to perform a 'background correction'. Also, the distribution of probe intensities differ from one another as a consequence of sample preparation steps, e.g., slightly different initial amounts of DNA from one sample compared to another. Basic steps in microarray analysis (many of these steps are shared by other high-throughput assays) are therefore _background correction_ and between-array _normalization_. Use the `preprocessIllumina()` command to perform these steps. Are there other normalization strategies available in this package? ```{r} ## background correction and normalization ## like Illumina Genome Studio (other approaches available) MSet.norm <- preprocessIllumina(RGset, bg.correct = TRUE, normalize = "controls", reference = 2) ``` Once data are background-corrected and normalized, it is possible to compare the vector of methylation values of each sample. Use the `mdsPlot()` function to visualize the multidimensional relationship between samples in reduced dimensions. Use arguments of `mdsPlot()` to name and highlight the different sample groups. ```{r} ## How similar are the samples to one another? mdsPlot(MSet.norm, numPositions = 1000, sampGroups = MSet.norm$Sample_Group, sampNames = MSet.norm$Sample_Name) ``` Take a portion of the data (the first 100,000 probes), retrieve the logit-transformed beta values, and then use `dmpFinder()` CpGs where methylation status is associated with sample group. From the help page, references, and your own knowledge, any ideas about `shrinkVar`? ```{r} ## Identify probes with methylation status differing between groups mset <- MSet.norm[1:100000,] ## logit(beta) M <- getM(mset, type = "beta", betaThreshold = 0.001) dmp <- dmpFinder(M, pheno=mset$Sample_Group, type="categorical") head(dmp) ``` Visualize differential methylation using `plotCpg()` ```{r} plotCpg(mset, cpg=rownames(dmp)[1], pheno=mset$Sample_Group) ``` Probes interrogate genomic locations. Use `mapToGenome()` to translate the probe identifiers to genomic coordinates. This transforms `mset` into an object of class `SummarizedExperiment`. A `SummarizedExperiment` is similar to an expression set, bui with `GRanges` to annotate rows, rather than a `data.frame`. Use `rowData()` to extract the `GRanges` from the `SummarizedExperiment`; explore it. Add the annotations about differentially expressed probes to the row data of the `SummarizedExperiment`. ```{r} ## Genomic locations mset <- mset[rownames(dmp),] mse <- mapToGenome(mset) # 'SummarizedExperiment' rowData(mse) mcols(rowData(mse)) <- cbind(mcols(rowData(mse)), dmp) ```