Glue Grant Array Quality Control (GlueQC)
is a script specially tailored for the quality control of the Glue Grant
Transcriptome Arrays. It provides two levels of quality controls: At probe
level, it plots densities of different contents for each array and a combined
density plot of all probes comparing across arrays. At probe set level, it
generates (1) correlation matrix between arrays at both exon- and gene-level,
with scatter plots if less than 10 arrays; and (2) a summary table of six
quality control metrics, and companion boxplots with potential outliers flagged
for each statistic.
GlueQC requires three parts: GlueQC package, R Bioconductor and Affymetrix Power Tools (APT):
(1) Install R and Bioconductor
a. Download and setup R; Make sure R binary has been added to the system path. To check, type 'Rscript' in the command line console to see if it output a help page. If not, please follow the system specific way to add the R binary path to the system path.
b. Open R and setup Bioconductor using the following script:
biocLite() #install R bionductor
library(affxparser) #test loading affxparser
library(geneplotter) #test loading geneplotter
(2) Install Affymetrix Power Tools. Make sure the binary has been added to the system path. To check, type 'apt-probeset-summarize' in the command line console to see if it output a help page. If not, please follow the system specific way to add the APT binary path to the system path.
(3) Download the GlueQC package and unzip to a folder. Open the GlueQC.sh (GlueQC.bat for Windows) in an editor and do the following modification:
a. Depends on your system, pick either 32-bit or 64-bit and uncomment the other one;
b. Replace '/Users/Weihong/GlueQC/GlueQC.R' with the path where you unzip GlueQC;
c. Replace '/Users/Weihong/lib/hGlue2_0' with the path where store the library files;
d. Change GlueQC.sh to be executable by 'chmod +x GlueQC.R' under Mac/Linux;
To run GlueQC, just copy GlueQC.sh (or GlueQC.bat for Windows) to your CEL file folder and double-click to run it. It will generate a QC folder with all output files.
1. Density plot of different contents (CELFILENAME.density_by_probesettype.png). “AntigenomicBG“ and “GenomicBG” represents background probes. “NormIntron” and “NormExon” represent negative and positive controls respectively. “Exon” represents the gene-exon content. In general, we should see a signal distribution with AntigenomicBG/GenomicBG < NormIntron < Exon < NormExon.
2. Density plot comparing all arrays (density_allarray.png). In general, we should expect similar distribution for similar tissue/sample conditions.
3. Correlation matrix (PSR_correlation.txt and PSR for exon and TC_correlation.txt for gene) and scatter plots if less than 10 arrays.
- Exon level:
- Gene level:
4. Summary table of QC metrices and companion boxplots (qc_summary.txt & qc.boxplot.png): the statistics included are:
pm_mean — the average intensity of all probes
bgrd_mean — the average intensity of all background probes
pos_vs_neg_auc — the area under curve of negative controls vs. positive controls, ranging from 0 to 100%, representing no separation to complete separation between negative controls and positive controls.
all_probeset_percent_called — the percentage of probe sets called as present using Detection Above Background Method (DABG)
PSR_cor_avg — the average exon correlation between arrays. CAUTION: the number is likely to be different than the correlation shown on scatterplot as this is an average across all arrays.
TC_cor_avg — the average gene correlation between arrays. CAUTION: the number is likely to be different than the correlation shown on scatterplot as this is an average across all arrays.
Outlier flags — if the statistic is more than 1.5 inter range quantile (IQR) away from the median, it will be flagged as outliers. CAUTION: this is assuming the arrays analyzed together are of similar condition (e.g., from the same tissue)