--- title: "atlasapprox (R interface)" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{atlasapprox (R interface)} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- Cell atlases such as Tabula Muris and Tabula Sapiens are multi-organ single cell omics data sets describing entire organisms. A cell atlas approximation is a lossy and lightweight compression of a cell atlas that can be streamed via the internet. This project enables biologists, doctors, and data scientist to quickly find answers for questions such as: - *What is the expression of a specific gene in human lung?* - *What are the marker genes of a specific cell type in mouse pancreas*? - *What fraction of cells (of a specific type) express a gene of interest?* *** **NOTE:** These questions can be also asked in R, Python, JavaScript or in a language agnostic manner using the REST API (see [https://atlasapprox.readthedocs.io](https://atlasapprox.readthedocs.io)). *** ## Installation To install the R interface of `atlasapprox` from CRAN, use: ```{r echo = TRUE, eval = FALSE} install.packages("atlasapprox") ``` ## Usage To use the package, you must first load it: ```{r echo = FALSE} knitr::opts_chunk$set(fig.width=6, fig.height=6) ``` ```{r setup} library("atlasapprox") ``` Now you have all `atlasapprox` functions available. ## Available organisms or species The easiest way to explore atlas approximations is to query a list of available organisms: ```{r} organisms <- GetOrganisms() print(organisms) ``` ## Organs in a single organism Once you know what species you are interested in, you can explore the list of organs from that species for which an atlas approximation is available: ```{r} human_organs <- GetOrgans(organism = 'h_sapiens') print(human_organs) ``` ## Cell types within an organ The next level of zoom is to query the list of cell types that make up an organ of choice, e.g.: ```{r} cell_types <- GetCelltypes(organism = 'h_sapiens', organ = 'Lung') print(cell_types) ``` *** **NOTE:** Although cell atlases aim to cover *all* cell types from a tissue, rare types might be missing because of limited sampling or inaccurate annotation. If you think a cell type is missing from a tissue, please contact fabio DOT zanini AT unsw DOT edu DOT au. *** ## Gene expression If you have some genes you are interested in, you can query their expression across cell types in the organ of choice: ```{r} expression <- GetAverage(organism = 'h_sapiens', organ = 'Lung', features = c('PTPRC', 'COL1A1')) print(expression) ``` You can also request not only the average *level* of expression, but the *fraction* of cells within each type that express the gene: ```{r} fraction_expressing <- GetFractionDetected(organism = 'h_sapiens', organ = 'Lung', features = c('PTPRC', 'COL1A1')) print(fraction_expressing) ``` To get a list of all available features (e.g. genes) for an organism, you can use: ```{r} genes <- GetFeatures(organism = 'h_sapiens') # To show just the first 20 genes print(head(genes, 20)) ``` ## Markers Each cell type expressed specific genes that contribute to its unique biological function, called markers. To request a list of markers for your cell type of choice: ```{r} markers <- GetMarkers(organism = 'h_sapiens', organ = 'Lung', cell_type = 'fibroblast', number = 5) print(markers) ``` *** **NOTE**: There are multiple methods to compute marker genes. The current version of the API uses one specific method, but future versions aim to give the user choice as of which method they prefer. *** ## Finding cells that highly express a gene If you're interested in knowing which cell types express your gene of interest the most, across all organs: ```{r} highest_expressors <- GetHighestMeasurement(organism = 'h_sapiens', feature = 'PTPRC', number = 5) print(highest_expressors) ``` ## Finding similar features If you want to find other features (genes) that show similar expression patterns to a feature of interest. To get a list of similar features for your gene of choice: ```{r} similar_genes <- GetSimilarFeatures(organism = 'h_sapiens', organ = 'lung', feature = 'PTPRC', number = 5, method = 'correlation') print(similar_genes) ``` *** **NOTE:** There are multiple methods to compute feature similarity. The available methods are: - correlation (default): Pearson correlation of the fraction of cells expressing each gene - cosine: Cosine similarity of the fraction of expressing cells - euclidean: Euclidean distance of average expression levels - manhattan: Manhattan distance of average expression levels - log-euclidean: Euclidean distance after log-transformation (useful for sparse features) *** ## Data sources `atlasapprox` relies upon available cell atlases kindly released for public use: - [A. queenslandica - Sebé-Pedrós et al 2018](https://www.nature.com/articles/s41559-018-0575-6) - A. thaliana: [Shahan et al 2022](https://doi.org/10.1016/j.devcel.2022.01.008) [root], [Xu et al 2024](https://doi.org/10.1101/2024.03.04.583414) [shoot] - [C. elegans - Cao et al. 2017](https://www.science.org/doi/10.1126/science.aam8940) - [C. gigas - Piovani et al 2023](https://doi.org/10.1126/sciadv.adg6034) - [C. hemisphaerica - Chari et al. 2021](https://www.science.org/doi/10.1126/sciadv.abh1683) - [D. melanogaster - Li et al. 2022](https://doi.org/10.1126/science.abk2432) - [D. rerio - Wagner et al. 2018](https://www.science.org/doi/10.1126/science.aar4362) - [F. vesca - Bai et al 2022](https://doi.org/10.1093/hr/uhab055) - [H. miamia - Hulett et al. 2023](https://www.nature.com/articles/s41467-023-38016-4) - H. sapiens: [ATAC - Zhang et al. Ren](https://doi.org/10.1016/j.cell.2021.10.024), [RNA - Tabula Sapiens](https://www.science.org/doi/10.1126/science.abl4896) - [H. vulgaris - Sieert et al 2019](https://doi.org/10.1126/science.aav9314) - [I. pulchra - Duruz et al. 2020](https://doi.org/10.1093/molbev/msaa333) - [L. minuta - Abramson et al. 2022](https://doi.org/10.1093/plphys/kiab564) - [M. leidyi - Sebé-Pedrós et al 2018](https://www.nature.com/articles/s41559-018-0575-6) - [M. murinus - Tabula Microcebus](https://www.biorxiv.org/content/10.1101/2021.12.12.469460v2) - [M. musculus - Tabula Muris Senis 2020](https://www.nature.com/articles/s41586-020-2496-1) - [N. vectensis - Steger et al 2022](https://doi.org/10.1016/j.celrep.2022.111370) - [O. sativa - Zhang et al 2021](https://doi.org/10.1038/s41467-021-22352-4) - [P. crozieri - Piovani et al 2023](https://doi.org/10.1126/sciadv.adg6034) - [P. dumerilii - Achim et al 2017](https://doi.org/10.1093/molbev/msx336) - [S. lacustris - Musser et al. 2021](https://www.science.org/doi/10.1126/science.abj2949) - [S. mansoni - Li et al. 2021](https://www.nature.com/articles/s41467-020-20794-w) - [S. mediterranea - Plass et al. 2018](https://doi.org/10.1126/science.aaq1723) - [S. pistillata - Levi et al. 2021](https://doi.org/10.1016/j.cell.2021.04.005) - [S. purpuratus - Paganos et al 2021](https://doi.org/10.7554/eLife.70416) - [T. adhaerens - Sebé-Pedrós et al 2018](https://www.nature.com/articles/s41559-018-0575-6) - [T. aestivum - Zhang et al 2023](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02908-x) - [X. laevis - Liao et al. 2022](https://www.nature.com/articles/s41467-022-31949-2) - Z. mays: [Marand et al 2021](https://doi.org/10.1016/j.cell.2021.04.014) [seedling], [Xu et al 2024](https://doi.org/10.1101/2024.03.04.583414) [eartip] We are grateful to all authors above for their help and committment to open science. To get the data sources in the package, call: ```{r eval = FALSE} data_sources <- GetDataSources() print(data_sources) ``` *** **NOTE:** Although the original cell type annotations of these data sets are mostly preserved, a quality check is performed before computing approximations. During this step, some cell types might be filtered out, renamed, or split into multiple subannotations. If you found a problem in the data that indicates misannotations, please reach out to fabio DOT zanini AT unsw DOT edu DOT au and we will endeavour to fix it. ***