Over the past two decades, many methodologies to infer proportions of individual cell types from bulk
transcriptomics data have emerged, along with new methods that use single-cell RNA-sequencing data
to infer cell proportions in bulk RNA-sequenced samples. There are many challenges that the
development of these methods must face. First the necessity to build reference datasets on
standardised and state-of-the-art computational tools, then the standardisation of cell type
annotation and marker selection and, finally, the necessity to improve both algorithm and signature
atlas generalizability to new bulk sample conditions. The first step to tackle some of those challenges
is to devise a single-cell reference panel that will allow us to provide a standardised resource, with a
consistent annotation method, to function as ground truth for the deconvolution algorithm. The final
goal is to obtain deconvoluted data at the cell type level from bulk gene expression.
In this preliminary test, we included two liver-based datasets: GSE149614 (71.915 cells, three non-
viral tumour samples, seven HBV or HCV related tumour samples), and a subset of GSE243981
(24.242 cells, six healthy samples), to create a balanced resource without a focus on function/disease.
To maximise the standardisation of our workflow we performed marker-based annotation of the
integrated panel using the software ScType and a curated list of signatures from GSE149614,GSE243981 and the CellMarker 2.0 database. Deconvolution was executed using the β-VAE
implementation provided by the Bulk2Space software, generating single-cell-like expression data.
Various types of bulk RNA-seq data were utilised to assess the resource usability: two normal liver
samples (one from GTEx, one from an internal dataset), one liver tumour sample (internal dataset) and
one Primary Human Hepatocyte sample from liver resection.
Integration of the two datasets was performed using Seurat/harmony and resulted in a panel of
96.159 cells and 16 samples. Annotation consisted of a 2-step approach: initial annotation by main cell
type followed by subtype identification for each cell type. To evaluate the results of the integration
procedure we compared the cell types/cell subtypes identified by our annotation approach to the
original annotation provided in the respective dataset, obtaining 93% and 79% concordance on cell
types for GSE149614 and GSE243981 respectively, while on cell subtypes, we obtained a concordance
of 46% and 86% respectively. Moreover, the deconvolution algorithm transferred the correct cell type
label (Hepatocytes) to the deconvoluted RNA-seq data.
With this first attempt, we demonstrate the feasibility of leveraging atlas-level characteristics of a
single-cell reference to perform bulk RNA-seq deconvolution retaining the relevant cell type
information. Furthermore, we established a standardised workflow for dataset integration and
annotation. Future work will expand the resource variability representation by including more
datasets. In addition, the implementation of an Adversarial Autoencoder network instead of the actual
β-VAE could improve deconvolution quality and increase the resolution of the label transfer (from cell
types to cell subtypes).
Product ID:
140405
Handle IRIS:
11562/1130546
Last Modified:
July 7, 2024
Bibliographic citation:
Gallinaro, Martina; Alfano, Vincenzo; Kerbaj, Coline; Maccarone, Giulia; Malerba, Giovanni; Plissonnier, Marie-Laure; Zeisel, Mirjam; Levrero, Massimo; Cocca, Massimiliano,
Empowering bulk RNA-seq deconvolution algorithms
by integrating multiple transcriptomics datasets in JOBIM 2024
, Proceedings of "JOBIM"
, TOULOUSE
, GIUGNO 2024
, 2024
, pp. 1-2