Glue Grant Array Quality Control (GlueQC)


 

Introduction

 

GlueQC is a script specially tailored for the quality control of the Glue Grant Transcriptome Arrays. It provides two levels of quality controls: At probe level, it plots densities of different contents for each array and a combined density plot of all probes comparing across arrays. At probe set level, it generates (1) correlation matrix between arrays at both exon- and gene-level, with scatter plots if less than 10 arrays; and (2) a summary table of six quality control metrics, and companion boxplots with potential outliers flagged for each statistic.

 

Setup

 

GlueQC requires three parts: GlueQC package, R Bioconductor and Affymetrix Power Tools (APT):

(1)   Install R and Bioconductor

a.     Download and setup R; Make sure R binary has been added to the system path. To check, type 'Rscript' in the command line console to see if it output a help page. If not, please follow the system specific way to add the R binary path to the system path.

b.     Open R and setup Bioconductor using the following script:

source("http://bioconductor.org/biocLite.R")  

biocLite() #install R bionductor

library(affxparser) #test loading affxparser

library(geneplotter) #test loading geneplotter

(2)   Install Affymetrix Power Tools. Make sure the binary has been added to the system path. To check, type 'apt-probeset-summarize' in the command line console to see if it output a help page. If not, please follow the system specific way to add the APT binary path to the system path.

(3)   Download the GlueQC package and unzip to a folder. Open the GlueQC.sh (GlueQC.bat for Windows) in an editor and do the following modification:

a.     Depends on your system, pick either 32-bit or 64-bit and uncomment the other one;

b.     Replace '/Users/Weihong/GlueQC/GlueQC.R' with the path where you unzip GlueQC;

c.     Replace '/Users/Weihong/lib/hGlue2_0' with the path where store the library files;

d.     Change GlueQC.sh to be executable by 'chmod +x GlueQC.R' under Mac/Linux;

 

Execution

 

To run GlueQC, just copy GlueQC.sh (or GlueQC.bat for Windows) to your CEL file folder and double-click to run it. It will generate a QC folder with all output files.

 

Sample Output

 

1. Density plot of different contents (CELFILENAME.density_by_probesettype.png). “AntigenomicBG“ and “GenomicBG” represents background probes. “NormIntron” and “NormExon” represent negative and positive controls respectively. “Exon” represents the gene-exon content. In general, we should see a signal distribution with AntigenomicBG/GenomicBG <  NormIntron < Exon < NormExon.

 

2. Density plot comparing all arrays (density_allarray.png). In general, we should expect similar distribution for similar tissue/sample conditions.

3. Correlation matrix (PSR_correlation.txt and PSR for exon and TC_correlation.txt for gene) and scatter plots if less than 10 arrays.

- Exon level:

- Gene level:

 

4.  Summary table of QC metrices and companion boxplots (qc_summary.txt & qc.boxplot.png): the statistics included are:

 

pm_mean — the average intensity of all probes

bgrd_mean — the average intensity of all background probes

pos_vs_neg_auc — the area under curve of negative controls vs. positive controls, ranging from 0 to 100%, representing no separation to complete separation between negative controls and positive controls.

all_probeset_percent_called — the percentage of probe sets called as present using Detection Above Background Method (DABG)

PSR_cor_avg — the average exon correlation between arrays. CAUTION: the number is likely to be different than the correlation shown on scatterplot as this is an average across all arrays.

TC_cor_avg — the average gene correlation between arrays. CAUTION: the number is likely to be different than the correlation shown on scatterplot as this is an average across all arrays.

Outlier flags — if the statistic is more than 1.5 inter range quantile (IQR) away from the median, it will be flagged as outliers. CAUTION: this is assuming the arrays analyzed together are of similar condition (e.g., from the same tissue)