1 scaeData

scaeData is a complementary package to the Bioconductor package SingleCellAlleleExperiment. It contains three datasets to be used when testing functions in SingleCellAlleleExperiment. These are:

  • 5k PBMCs of a healthy donor, 3’ v3 chemistry
  • 10k PBMCs of a healthy donor, 3’ v3 chemistry
  • 20k PBMCs of a healthy donor, 3’ v3 chemistry

The raw FASTQs for all three datasets were sourced from publicly accessible datasets provided by 10x Genomics.

After downloading the raw data, the scIGD Snakemake workflow was utilized to perform HLA allele-typing processes and generate allele-specific quantification from scRNA-seq data using donor-specific references.

2 Quick Start

2.1 Installation

From Bioconductor:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")

BiocManager::install("scaeData")

Alternatively, a development version is available on GitHub and can be installed via:

if (!require("devtools", quietly = TRUE))
    install.packages("devtools")

devtools::install_github("AGImkeller/scaeData", build_vignettes = TRUE)

3 Usage

The datasets within scaeData are accessible using the scaeDataGet() function:

library("scaeData")
pbmc_5k <- scaeDataGet("pbmc_5k")
pbmc_10k <- scaeDataGet("pbmc_10k")

For example, we can view pbmc_20k:

pbmc_20k <- scaeDataGet("pbmc_20k")
## Retrieving barcode identifiers for **pbmc 20k** dataset...DONE
## Retrieving feature identifiers for **pbmc 20k** dataset...DONE
## Retrieving quantification matrix for **pbmc 20k** dataset...DONE
pbmc_20k
## $dir
## [1] "/home/biocbuild/.cache/R/ExperimentHub/"
## 
## $barcodes
## [1] "29e0bf13b06ae7_9525"
## 
## $features
## [1] "29e0bf272bf11_9526"
## 
## $matrix
## [1] "29e0bf17528aa9_9527"
cells.dir <- file.path(pbmc_20k$dir, pbmc_20k$barcodes)
features.dir <- file.path(pbmc_20k$dir, pbmc_20k$features)
mat.dir <- file.path(pbmc_20k$dir, pbmc_20k$matrix)

cells <- utils::read.csv(cells.dir, sep = "", header = FALSE)
features <- utils::read.delim(features.dir, header = FALSE)
mat <- Matrix::readMM(mat.dir)

rownames(mat) <- cells$V1
colnames(mat) <- features$V1
head(mat)
## 6 x 62760 sparse Matrix of class "dgTMatrix"
##   [[ suppressing 34 column names 'ENSG00000279928.2', 'ENSG00000228037.1', 'ENSG00000142611.17' ... ]]
##                                                                               
## AAACCCAAGAAACACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AAACCCAAGAAACTCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AAACCCAAGAAACTGT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AAACCCAAGAAATTGC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AAACCCAAGAACAAGG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
## AAACCCAAGAACAGGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
##                              
## AAACCCAAGAAACACT . . . ......
## AAACCCAAGAAACTCA . . . ......
## AAACCCAAGAAACTGT . . . ......
## AAACCCAAGAAATTGC . . . ......
## AAACCCAAGAACAAGG . . . ......
## AAACCCAAGAACAGGA . . . ......
## 
##  .....suppressing 62726 columns in show(); maybe adjust options(max.print=, width=)
##  ..............................

A SingleCellAlleleExperiment object, scae for short, can be generated using the read_allele_counts() function retrieved from the SingleCellAlleleExperiment package.

A lookup table corresponding to each dataset, facilitating the creation of relevant additional data layers during object generation, can be accessed from the package’s extdata:

lookup <- read.csv(system.file("extdata", "pbmc_20k_lookup_table.csv", package="scaeData"))

library("SingleCellAlleleExperiment")
## Loading required package: SingleCellExperiment
## Loading required package: SummarizedExperiment
## Loading required package: MatrixGenerics
## Loading required package: matrixStats
## 
## Attaching package: 'MatrixGenerics'
## The following objects are masked from 'package:matrixStats':
## 
##     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
##     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
##     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
##     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
##     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
##     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
##     colWeightedMeans, colWeightedMedians, colWeightedSds,
##     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
##     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
##     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
##     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
##     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
##     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
##     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
##     rowWeightedSds, rowWeightedVars
## Loading required package: GenomicRanges
## Loading required package: stats4
## Loading required package: BiocGenerics
## 
## Attaching package: 'BiocGenerics'
## The following objects are masked from 'package:stats':
## 
##     IQR, mad, sd, var, xtabs
## The following objects are masked from 'package:base':
## 
##     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
##     as.data.frame, basename, cbind, colnames, dirname, do.call,
##     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
##     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
##     pmin.int, rank, rbind, rownames, sapply, setdiff, table, tapply,
##     union, unique, unsplit, which.max, which.min
## Loading required package: S4Vectors
## 
## Attaching package: 'S4Vectors'
## The following object is masked from 'package:utils':
## 
##     findMatches
## The following objects are masked from 'package:base':
## 
##     I, expand.grid, unname
## Loading required package: IRanges
## Loading required package: GenomeInfoDb
## Loading required package: Biobase
## Welcome to Bioconductor
## 
##     Vignettes contain introductory material; view with
##     'browseVignettes()'. To cite Bioconductor, see
##     'citation("Biobase")', and for packages 'citation("pkgname")'.
## 
## Attaching package: 'Biobase'
## The following object is masked from 'package:MatrixGenerics':
## 
##     rowMedians
## The following objects are masked from 'package:matrixStats':
## 
##     anyMissing, rowMedians
scae_20k <- read_allele_counts(pbmc_20k$dir,
                               sample_names = "example_data",
                               filter_mode = "no",
                               lookup_file = lookup,
                               barcode_file = pbmc_20k$barcodes,
                               gene_file = pbmc_20k$features,
                               matrix_file = pbmc_20k$matrix,
                               verbose = TRUE)
## Filtering performed on default value at 0 UMI counts.
## Data Read_in completed
##   Generating SCAE object: Extending rowData with new classifiers
##   Generating SCAE object: Filtering at 0 UMI counts.
##   Generating SCAE object: Aggregating alleles corresponding to the same gene
##   Generating SCAE object: Aggregating genes corresponding to the same functional groups
## SingleCellAlleleExperiment object completed
scae_20k
## class: SingleCellAlleleExperiment 
## dim: 62769 2261243 
## metadata(0):
## assays(1): counts
## rownames(62769): ENSG00000279928.2 ENSG00000228037.1 ... HLA_class_I
##   HLA_class_II
## rowData names(3): Ensembl_ID NI_I Quant_type
## colnames(2261243): AAACCCAAGAAACACT AAACCCAAGAAACTCA ...
##   TTTGTTGTCTTTGGAG TTTGTTGTCTTTGTCG
## colData names(2): Sample Barcode
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
## ---------------
## Including a total of 23 immune related features:
## Allele-level information (14): A*02:01:01:01 A*24:02:01:01 ...
##   DQB1*02:02 DQB1*06:03:01
## Immune genes (7): HLA-A HLA-B ... HLA-DQA1 HLA-DQB1
## Functional level information (2): HLA_class_I HLA_class_II

Please refer to the vignette and documentation of SingleCellAlleleExperiment to further work with this kind of data container.

Session info

sessionInfo()
## R version 4.4.0 RC (2024-04-16 r86468)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 22.04.4 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.20-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats4    stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] SingleCellAlleleExperiment_1.1.0 SingleCellExperiment_1.27.0     
##  [3] SummarizedExperiment_1.35.0      Biobase_2.65.0                  
##  [5] GenomicRanges_1.57.0             GenomeInfoDb_1.41.0             
##  [7] IRanges_2.39.0                   S4Vectors_0.43.0                
##  [9] BiocGenerics_0.51.0              MatrixGenerics_1.17.0           
## [11] matrixStats_1.3.0                scaeData_1.1.1                  
## [13] BiocStyle_2.33.0                
## 
## loaded via a namespace (and not attached):
##  [1] KEGGREST_1.45.0         xfun_0.43               bslib_0.7.0            
##  [4] lattice_0.22-6          vctrs_0.6.5             tools_4.4.0            
##  [7] generics_0.1.3          parallel_4.4.0          curl_5.2.1             
## [10] tibble_3.2.1            fansi_1.0.6             AnnotationDbi_1.67.0   
## [13] RSQLite_2.3.6           blob_1.2.4              pkgconfig_2.0.3        
## [16] Matrix_1.7-0            dbplyr_2.5.0            lifecycle_1.0.4        
## [19] GenomeInfoDbData_1.2.12 compiler_4.4.0          Biostrings_2.73.0      
## [22] codetools_0.2-20        htmltools_0.5.8.1       sass_0.4.9             
## [25] yaml_2.3.8              pillar_1.9.0            crayon_1.5.2           
## [28] jquerylib_0.1.4         BiocParallel_1.39.0     DelayedArray_0.31.0    
## [31] cachem_1.0.8            abind_1.4-5             mime_0.12              
## [34] ExperimentHub_2.13.0    AnnotationHub_3.13.0    tidyselect_1.2.1       
## [37] digest_0.6.35           dplyr_1.1.4             purrr_1.0.2            
## [40] bookdown_0.39           BiocVersion_3.20.0      grid_4.4.0             
## [43] fastmap_1.1.1           SparseArray_1.5.0       cli_3.6.2              
## [46] magrittr_2.0.3          S4Arrays_1.5.0          utf8_1.2.4             
## [49] withr_3.0.0             filelock_1.0.3          UCSC.utils_1.1.0       
## [52] rappdirs_0.3.3          bit64_4.0.5             rmarkdown_2.26         
## [55] XVector_0.45.0          httr_1.4.7              bit_4.0.5              
## [58] png_0.1-8               memoise_2.0.1           evaluate_0.23          
## [61] knitr_1.46              BiocFileCache_2.13.0    rlang_1.1.3            
## [64] glue_1.7.0              DBI_1.2.2               BiocManager_1.30.22    
## [67] jsonlite_1.8.8          R6_2.5.1                zlibbioc_1.51.0