Introduction to clustermole

blindly digging for cell types in scRNA-seq clusters

Overview

A typical computational pipeline to process single-cell RNA sequencing (scRNA-seq) data includes clustering of cells as one of the steps. Assignment of cell type labels to those clusters is often a time-consuming process that involves manual inspection of the cluster marker genes complemented with a detailed literature search. This is especially challenging when unexpected or poorly described populations are present. The clustermole R package provides methods to query thousands of human and mouse cell identity markers sourced from a variety of databases.

The clustermole package provides three primary features:

  • cell type prediction based on marker genes (clustermole_overlaps)
  • cell type prediction based on a full expression matrix (clustermole_enrichment)
  • a database of cell type markers (clustermole_markers)

Usage

Install clustermole if it is not yet available on your system.

Load clustermole.

clustermole_enrichment(): cell type enrichment in the full expression matrix

If you have a table of expression values, such as average expression across clusters, you can perform cell type enrichment based on a given gene expression matrix (log-transformed CPM/TPM/FPKM values). Genes are rows and clusters/samples are columns.

clustermole_markers(): retrieve cell type markers

You can use clustermole as a simple database and get a data frame of all cell type markers.

Each row contains a gene and a cell type associated with it. The gene column is the gene symbol (human or mouse versions can be retrieved) and the celltype_full column contains the full cell type string, including the species and the original database.

If you need to convert the markers from a data frame to a list format for other applications, you can use gene as the values and celltype_full as the grouping variable.

Database details

We will load dplyr to help with the summary statistics.

You can use clustermole_markers() to retrieve a data frame of all cell type markers in the collection.

Check the number of available cell types.

Check the number of available cell types per species (not available for every cell type).

Check the number of available cell types per organ (not available for every cell type).

Check package version.