Publications
2019
The emerging diversity of single-cell RNA-seq datasets allows for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. However, it is challenging to analyze them together, particularly when datasets are assayed with different technologies, because biological and technical differences are interspersed. We present Harmony (https://github.com/immunogenomics/harmony), an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Harmony simultaneously accounts for multiple experimental and biological factors. In six analyses, we demonstrate the superior performance of Harmony to previously published algorithms while requiring fewer computational resources. Harmony enables the integration of 106 cells on a personal computer. We apply Harmony to peripheral blood mononuclear cells from datasets with large experimental differences, five studies of pancreatic islet cells, mouse embryogenesis datasets and the integration of scRNA-seq with spatial transcriptomics data.
The emerging diversity of single-cell RNA-seq datasets allows for the full transcriptional characterization of cell types across a wide variety of biological and clinical conditions. However, it is challenging to analyze them together, particularly when datasets are assayed with different technologies, because biological and technical differences are interspersed. We present Harmony (https://github.com/immunogenomics/harmony), an algorithm that projects cells into a shared embedding in which cells group by cell type rather than dataset-specific conditions. Harmony simultaneously accounts for multiple experimental and biological factors. In six analyses, we demonstrate the superior performance of Harmony to previously published algorithms while requiring fewer computational resources. Harmony enables the integration of 106 cells on a personal computer. We apply Harmony to peripheral blood mononuclear cells from datasets with large experimental differences, five studies of pancreatic islet cells, mouse embryogenesis datasets and the integration of scRNA-seq with spatial transcriptomics data.
To define the cell populations that drive joint inflammation in rheumatoid arthritis (RA), we applied single-cell RNA sequencing (scRNA-seq), mass cytometry, bulk RNA sequencing (RNA-seq) and flow cytometry to T cells, B cells, monocytes, and fibroblasts from 51 samples of synovial tissue from patients with RA or osteoarthritis (OA). Utilizing an integrated strategy based on canonical correlation analysis of 5,265 scRNA-seq profiles, we identified 18 unique cell populations. Combining mass cytometry and transcriptomics revealed cell states expanded in RA synovia: THY1(CD90)+HLA-DRAhi sublining fibroblasts, IL1B+ pro-inflammatory monocytes, ITGAX+TBX21+ autoimmune-associated B cells and PDCD1+ peripheral helper T (TPH) cells and follicular helper T (TFH) cells. We defined distinct subsets of CD8+ T cells characterized by GZMK+, GZMB+, and GNLY+ phenotypes. We mapped inflammatory mediators to their source cell populations; for example, we attributed IL6 expression to THY1+HLA-DRAhi fibroblasts and IL1B production to pro-inflammatory monocytes. These populations are potentially key mediators of RA pathogenesis.
2018
To define potentially causal variants for autoimmune disease, we fine-mapped1,2 76 rheumatoid arthritis (11,475 cases, 15,870 controls)3 and type 1 diabetes loci (9,334 cases, 11,111 controls)4. After sequencing 799 1-kilobase regulatory (H3K4me3) regions within these loci in 568 individuals, we observed accurate imputation for 89% of common variants. We defined credible sets of ≤5 causal variants at 5 rheumatoid arthritis and 10 type 1 diabetes loci. We identified potentially causal missense variants at DNASE1L3, PTPN22, SH2B3, and TYK2, and noncoding variants at MEG3, CD28-CTLA4, and IL2RA. We also identified potential candidate causal variants at SIRPG and TNFAIP3. Using functional assays, we confirmed allele-specific protein binding and differential enhancer activity for three variants: the CD28-CTLA4 rs117701653 SNP, MEG3 rs34552516 indel, and TNFAIP3 rs35926684 indel.
High-dimensional single-cell analyses have improved the ability to resolve complex mixtures of cells from human disease samples; however, identifying disease-associated cell types or cell states in patient samples remains challenging because of technical and interindividual variation. Here, we present mixed-effects modeling of associations of single cells (MASC), a reverse single-cell association strategy for testing whether case-control status influences the membership of single cells in any of multiple cellular subsets while accounting for technical confounders and biological variation. Applying MASC to mass cytometry analyses of CD4+ T cells from the blood of rheumatoid arthritis (RA) patients and controls revealed a significantly expanded population of CD4+ T cells, identified as CD27- HLA-DR+ effector memory cells, in RA patients (odds ratio, 1.7; P = 1.1 × 10-3). The frequency of CD27- HLA-DR+ cells was similarly elevated in blood samples from a second RA patient cohort, and CD27- HLA-DR+ cell frequency decreased in RA patients who responded to immunosuppressive therapy. Mass cytometry and flow cytometry analyses indicated that CD27- HLA-DR+ cells were associated with RA (meta-analysis P = 2.3 × 10-4). Compared to peripheral blood, synovial fluid and synovial tissue samples from RA patients contained about fivefold higher frequencies of CD27- HLA-DR+ cells, which comprised 10% of synovial CD4+ T cells. CD27- HLA-DR+ cells expressed a distinctive effector memory transcriptomic program with T helper 1 (TH1)- and cytotoxicity-associated features and produced abundant interferon-γ (IFN-γ) and granzyme A protein upon stimulation. We propose that MASC is a broadly applicable method to identify disease-associated cell populations in high-dimensional single-cell data.
BACKGROUND: Cytokines are critical to human disease and are attractive therapeutic targets given their widespread influence on gene regulation and transcription. Defining the downstream regulatory mechanisms influenced by cytokines is central to defining drug and disease mechanisms. One promising strategy is to use interactions between expression quantitative trait loci (eQTLs) and cytokine levels to define target genes and mechanisms.
RESULTS: In a clinical trial for anti-IL-6 in patients with systemic lupus erythematosus, we measure interferon (IFN) status, anti-IL-6 drug exposure, and whole blood genome-wide gene expression at three time points. We show that repeat transcriptomic measurements increases the number of cis eQTLs identified compared to using a single time point. We observe a statistically significant enrichment of in vivo eQTL interactions with IFN status and anti-IL-6 drug exposure and find many novel interactions that have not been previously described. Finally, we find transcription factor binding motifs interrupted by eQTL interaction SNPs, which point to key regulatory mediators of these environmental stimuli and therefore potential therapeutic targets for autoimmune diseases. In particular, genes with IFN interactions are enriched for ISRE binding site motifs, while those with anti-IL-6 interactions are enriched for IRF4 motifs.
CONCLUSIONS: This study highlights the potential to exploit clinical trial data to discover in vivo eQTL interactions with therapeutically relevant environmental variables.
2017
CD4+ T cells have been long known to play an important role in the pathogenesis of rheumatoid arthritis (RA), but the specific cell populations and states that drive the disease have been challenging to identify with low dimensional single cell data and bulk assays. The advent of high dimensional single cell technologies-like single cell RNA-seq or mass cytometry-has offered promise to defining key populations, but brings new methodological and statistical challenges. Recent single cell profiling studies have revealed a broad diversity of cell types among CD4+ T cells, identifying novel populations that are expanded or altered in RA. Here, we will review recent findings on CD4+ T cell heterogeneity and RA that have come from single cell profiling studies and discuss the best practices for conducting these studies.
2015
Identifying genomic annotations that differentiate causal from trait-associated variants is essential to fine mapping disease loci. Although many studies have identified non-coding functional annotations that overlap disease-associated variants, these annotations often colocalize, complicating the ability to use these annotations for fine mapping causal variation. We developed a statistical approach (Genomic Annotation Shifter [GoShifter]) to assess whether enriched annotations are able to prioritize causal variation. GoShifter defines the null distribution of an annotation overlapping an allele by locally shifting annotations; this approach is less sensitive to biases arising from local genomic structure than commonly used enrichment methods that depend on SNP matching. Local shifting also allows GoShifter to identify independent causal effects from colocalizing annotations. Using GoShifter, we confirmed that variants in expression quantitative trail loci drive gene-expression changes though DNase-I hypersensitive sites (DHSs) near transcription start sites and independently through 3' UTR regulation. We also showed that (1) 15%-36% of trait-associated loci map to DHSs independently of other annotations; (2) loci associated with breast cancer and rheumatoid arthritis harbor potentially causal variants near the summits of histone marks rather than full peak bodies; (3) variants associated with height are highly enriched in embryonic stem cell DHSs; and (4) we can effectively prioritize causal variation at specific loci.
2014
Despite progress in defining human leukocyte antigen (HLA) alleles for anti-citrullinated-protein-autoantibody-positive (ACPA(+)) rheumatoid arthritis (RA), identifying HLA alleles for ACPA-negative (ACPA(-)) RA has been challenging because of clinical heterogeneity within clinical cohorts. We imputed 8,961 classical HLA alleles, amino acids, and SNPs from Immunochip data in a discovery set of 2,406 ACPA(-) RA case and 13,930 control individuals. We developed a statistical approach to identify and adjust for clinical heterogeneity within ACPA(-) RA and observed independent associations for serine and leucine at position 11 in HLA-DRβ1 (p = 1.4 × 10(-13), odds ratio [OR] = 1.30) and for aspartate at position 9 in HLA-B (p = 2.7 × 10(-12), OR = 1.39) within the peptide binding grooves. These amino acid positions induced associations at HLA-DRB1(∗)03 (encoding serine at 11) and HLA-B(∗)08 (encoding aspartate at 9). We validated these findings in an independent set of 427 ACPA(-) case subjects, carefully phenotyped with a highly sensitive ACPA assay, and 1,691 control subjects (HLA-DRβ1 Ser11+Leu11: p = 5.8 × 10(-4), OR = 1.28; HLA-B Asp9: p = 2.6 × 10(-3), OR = 1.34). Although both amino acid sites drove risk of ACPA(+) and ACPA(-) disease, the effects of individual residues at HLA-DRβ1 position 11 were distinct (p < 2.9 × 10(-107)). We also identified an association with ACPA(+) RA at HLA-A position 77 (p = 2.7 × 10(-8), OR = 0.85) in 7,279 ACPA(+) RA case and 15,870 control subjects. These results contribute to mounting evidence that ACPA(+) and ACPA(-) RA are genetically distinct and potentially have separate autoantigens contributing to pathogenesis. We expect that our approach might have broad applications in analyzing clinical conditions with heterogeneity at both major histocompatibility complex (MHC) and non-MHC regions.