Title: | Core Utilities for Single-Cell RNA-Seq |
---|---|
Description: | Core utilities for single-cell RNA-seq data analysis. Contained within are utility functions for working with differential expression (DE) matrices and count matrices, a collection of functions for manipulating and plotting data via 'ggplot2', and functions to work with cell graphs and cell embeddings. Graph-based methods include embedding kNN cell graphs into a UMAP <doi:10.21105/joss.00861>, collapsing vertices of each cluster in the graph, and propagating graph labels. |
Authors: | Viktor Petukhov [aut], Rasmus Rydbirk [aut], Peter Kharchenko [aut], Evan Biederstedt [aut, cre] |
Maintainer: | Evan Biederstedt <[email protected]> |
License: | GPL-3 |
Version: | 1.0.5 |
Built: | 2024-11-22 05:38:22 UTC |
Source: | https://github.com/kharchenkolab/sccore |
List of adjacent vertex weights from igraph object
adjacent_vertex_weights(edge_verts, edge_weights)
adjacent_vertex_weights(edge_verts, edge_weights)
edge_verts |
edge vertices of igraph graph object |
edge_weights |
edge weights of igraph graph object |
list of adjacent vertices
## Not run: edges <- igraph::as_edgelist(conosGraph) edge.weights <- igraph::edge.attributes(conosGraph)$weight adjacent_vertex_weights(edges, edge.weights) ## End(Not run)
## Not run: edges <- igraph::as_edgelist(conosGraph) edge.weights <- igraph::edge.attributes(conosGraph)$weight adjacent_vertex_weights(edges, edge.weights) ## End(Not run)
List of adjacent vertices from igraph object
adjacentVertices(edge_verts)
adjacentVertices(edge_verts)
edge_verts |
edge vertices of igraph graph object |
list of adjacent vertices
## Not run: edges <- igraph::as_edgelist(conosGraph) adjacentVertices(edges) ## End(Not run)
## Not run: edges <- igraph::as_edgelist(conosGraph) adjacentVertices(edges) ## End(Not run)
Append specificity metrics to DE
appendSpecificityMetricsToDE( de.df, clusters, cluster.id, p2.counts, low.expression.threshold = 0, append.auc = FALSE )
appendSpecificityMetricsToDE( de.df, clusters, cluster.id, p2.counts, low.expression.threshold = 0, append.auc = FALSE )
de.df |
data.frame of differential expression values |
clusters |
factor of clusters |
cluster.id |
names of 'clusters' factor. If a cluster.id doesn't exist in cluster names, an error is thrown. |
p2.counts |
counts from Pagoda2, refer to <https://github.com/kharchenkolab/pagoda2> |
low.expression.threshold |
numeric Threshold to remove expression values (default=0). Values under this threshold are discarded. |
append.auc |
boolean If TRUE, append AUC values (default=FALSE) |
data.frame of differential expression values with metrics attached
convert character vector into a factor with names "values" and "levels"
as_factor(vals)
as_factor(vals)
vals |
vector of values to evaluate |
factor with names "values" and "levels"
Conos cell annotations
cellAnnotations
cellAnnotations
An object of class character
of length 3000.
Check whether a package is installed and suggest how to install from CRAN, Bioconductor, or other external source
checkPackageInstalled( pkgs, details = "to run this function", install.help = NULL, bioc = FALSE, cran = FALSE )
checkPackageInstalled( pkgs, details = "to run this function", install.help = NULL, bioc = FALSE, cran = FALSE )
pkgs |
character Package name(s) |
details |
character Helper text (default = "to run this function") |
install.help |
character Additional information on how to install package (default = NULL) |
bioc |
logical Package(s) is/are available from Bioconductor (default = FALSE) |
cran |
logical Package(s) is/are available from CRAN (default = FALSE) |
## Not run: checkPackageInstalled("sccore", cran = TRUE) ## End(Not run)
## Not run: checkPackageInstalled("sccore", cran = TRUE) ## End(Not run)
Collapse count matrices by cell type, given min/max number of cells
collapseCellsByType(cm, groups, min.cell.count = 10, max.cell.count = Inf)
collapseCellsByType(cm, groups, min.cell.count = 10, max.cell.count = Inf)
cm |
count matrix |
groups |
factor specifying cell types |
min.cell.count |
numeric Minimum number of cells to include (default=10) |
max.cell.count |
numeric Maximum number of cells to include (default=Inf). If Inf, there is no maximum. |
Subsetted factor of collapsed cells by type, with NA cells omitted
Collapse graph using PAGA 1.2 algorithm, Wolf et al 2019, Genome Biology (2019) <https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1663-x>
collapseGraphPaga(graph, groups, linearize = TRUE, winsorize = FALSE)
collapseGraphPaga(graph, groups, linearize = TRUE, winsorize = FALSE)
graph |
igraph graph object Graph to be collapsed |
groups |
factor on vertices describing cluster assignment (can specify integer vertex ids, or character vertex names which will be matched) |
linearize |
should normally be always TRUE (default=TRUE) |
winsorize |
winsorize final connectivity statistics value (default=FALSE) Note: Original PAGA has it as always TRUE, but in this case there is no way to distinguish level of connectivity for highly connected groups. |
collapsed graph
Collapse Graph By Sum
collapseGraphSum(graph, groups, normalize = TRUE)
collapseGraphSum(graph, groups, normalize = TRUE)
graph |
igraph graph object Graph to be collapsed |
groups |
factor on vertices describing cluster assignment (can specify integer vertex ids, or character vertex names which will be matched) |
normalize |
boolean Whether to recalculate edge weight as observed/expected (default=TRUE) |
collapsed graph
collapsed = collapseGraphPaga(conosGraph, igraph::V(conosGraph), linearize=TRUE, winsorize=FALSE)
collapsed = collapseGraphPaga(conosGraph, igraph::V(conosGraph), linearize=TRUE, winsorize=FALSE)
Calculates factor-stratified sums for each column
colSumByFactor(sY, rowSel)
colSumByFactor(sY, rowSel)
sY |
sparse matrix (dgCmatrix) |
rowSel |
integer factor. Note that the 0-th column will return sums for any NA values; 0 or negative values will be omitted |
Matrix
Conos clusters list
conosClusterList
conosClusterList
An object of class list
of length 2.
Conos graph
conosGraph
conosGraph
An object of class igraph
of length 100.
Dot plot adapted from Seurat:::DotPlot, see ?Seurat:::DotPlot for details
dotPlot( markers, count.matrix, cell.groups, marker.colour = "black", cluster.colour = "black", xlab = "Marker", ylab = "Cluster", n.cores = 1, text.angle = 45, gene.order = NULL, cols = c("blue", "red"), col.min = -2.5, col.max = 2.5, dot.min = 0, dot.scale = 6, scale.by = "radius", scale.center = FALSE, scale.min = NA, scale.max = NA, verbose = FALSE, ... )
dotPlot( markers, count.matrix, cell.groups, marker.colour = "black", cluster.colour = "black", xlab = "Marker", ylab = "Cluster", n.cores = 1, text.angle = 45, gene.order = NULL, cols = c("blue", "red"), col.min = -2.5, col.max = 2.5, dot.min = 0, dot.scale = 6, scale.by = "radius", scale.center = FALSE, scale.min = NA, scale.max = NA, verbose = FALSE, ... )
markers |
Vector of gene markers to plot |
count.matrix |
Merged count matrix, cells in rows and genes in columns |
cell.groups |
Named factor containing cell groups (clusters) and cell names as names |
marker.colour |
Character or numeric vector (default="black") |
cluster.colour |
Character or numeric vector (default="black") |
xlab |
string X-axis title (default="Marker") |
ylab |
string Y-axis title (default="Cluster") |
n.cores |
integer Number of cores (default=1) |
text.angle |
numeric Angle of text displayed (default=45) |
gene.order |
Either factor of genes passed to dplyr::mutate(levels=gene.order), or a boolean. (default=NULL) If TRUE, gene.order is set to the unique markers. If FALSE, gene.order is set to NULL. If NULL, the argument is ignored. |
cols |
Colors to plot (default=c("blue", "red")). The name of a palette from 'RColorBrewer::brewer.pal.info', a pair of colors defining a gradient, or 3+ colors defining multiple gradients (if 'split.by' is set). |
col.min |
numeric Minimum scaled average expression threshold (default=-2.5). Everything smaller will be set to this. |
col.max |
numeric Maximum scaled average expression threshold (default=2.5). Everything larger will be set to this. |
dot.min |
numeric The fraction of cells at which to draw the smallest dot (default=0). All cell groups with less than this expressing the given gene will have no dot drawn. |
dot.scale |
numeric Scale the size of the points, similar to cex (default=6) |
scale.by |
string Scale the size of the points by 'size' or by 'radius' (default="radius") |
scale.center |
boolean Center scaling, see 'scale()' argument 'center' (default=FALSE) |
scale.min |
numeric Set lower limit for scaling, use NA for default (default=NA) |
scale.max |
numeric Set upper limit for scaling, use NA for default (default=NA) |
verbose |
boolean Verbose output (default=TRUE) |
... |
Additional inputs passed to sccore::plapply(), see man for description. |
ggplot2 object
library(dplyr) ## Create merged count matrix ## In this example, cms is a list of count matrices from, e.g., Cellranger count, ## where cells are in columns and genes in rows ## cm <- sccore:::mergeCountMatrices(cms, transposed = FALSE) %>% Matrix::t() ## If coming from Conos, this can be extracted like so ## cm <- conos.obj$getJointCountMatrix(raw = FALSE) # Either normalized or raw values can be used ## Here, we create a random sparse matrix cm <- Matrix::rsparsematrix(30,3,0.5) %>% abs(.) %>% `dimnames<-`(list(1:30,c("gene1","gene2","gene3"))) ## Create marker vector markers <- c("gene1","gene2","gene3") ## Additionally, color vectors can be included. ## These should have the same length as the input (markers, cell groups) ## Otherwise, they are recycled col.markers <- c("black","black","red") # or c(1,1,2) col.clusters <- c("black","red","black") # or c(1,2,1) ## Create annotation vector annotation <- c(rep("cluster1",10),rep("cluster2",10),rep("cluster3",10)) %>% factor() %>% setNames(1:30) ## Plot. Here, the expression colours range from gray (low expression) to purple (high expression) sccore:::dotPlot(markers = markers, count.matrix = cm, cell.groups = annotation, marker.colour = col.markers, cluster.colour = col.clusters, cols=c("gray","purple"))
library(dplyr) ## Create merged count matrix ## In this example, cms is a list of count matrices from, e.g., Cellranger count, ## where cells are in columns and genes in rows ## cm <- sccore:::mergeCountMatrices(cms, transposed = FALSE) %>% Matrix::t() ## If coming from Conos, this can be extracted like so ## cm <- conos.obj$getJointCountMatrix(raw = FALSE) # Either normalized or raw values can be used ## Here, we create a random sparse matrix cm <- Matrix::rsparsematrix(30,3,0.5) %>% abs(.) %>% `dimnames<-`(list(1:30,c("gene1","gene2","gene3"))) ## Create marker vector markers <- c("gene1","gene2","gene3") ## Additionally, color vectors can be included. ## These should have the same length as the input (markers, cell groups) ## Otherwise, they are recycled col.markers <- c("black","black","red") # or c(1,1,2) col.clusters <- c("black","red","black") # or c(1,2,1) ## Create annotation vector annotation <- c(rep("cluster1",10),rep("cluster2",10),rep("cluster3",10)) %>% factor() %>% setNames(1:30) ## Plot. Here, the expression colours range from gray (low expression) to purple (high expression) sccore:::dotPlot(markers = markers, count.matrix = cm, cell.groups = annotation, marker.colour = col.markers, cluster.colour = col.clusters, cols=c("gray","purple"))
Set colors for embedding plot. Used primarily in embeddingPlot().
embeddingColorsPlot( plot.df, colors, groups = NULL, geom_point_w = ggplot2::geom_point, gradient.range.quantile = 1, color.range = "symmetric", legend.title = NULL, palette = NULL, plot.na = TRUE )
embeddingColorsPlot( plot.df, colors, groups = NULL, geom_point_w = ggplot2::geom_point, gradient.range.quantile = 1, color.range = "symmetric", legend.title = NULL, palette = NULL, plot.na = TRUE )
plot.df |
data.frame for plotting. In embeddingPlot(), this is a tibble from tibble::rownames_to_column(). |
colors |
vector of numbers, which must be shown with point colors, names contain cell names (default=NULL). This argument is ignored if groups are provided. |
groups |
vector of cluster labels, names contain cell names (default=NULL) |
geom_point_w |
function to work with geom_point layer from ggplot2 (default=ggplot2::geom_point) |
gradient.range.quantile |
Winsorization quantile for the numeric colors and gene gradient (default=1) |
color.range |
controls range, in which colors are estimated (default="symmetric"). Pass "all" to estimate range based on all values of "colors", pass "data" to estimate it only based on colors, presented in the embedding. Alternatively you can pass vector of length 2 with (min, max) values. |
legend.title |
legend title (default=NULL) |
palette |
vector or list or function (default=NULL). Accepts number of colors and return list of colors (i.e. see 'colorRampPalette') (default=NULL) |
plot.na |
boolean/numeric Whether to plot points, for which groups / colors are missed (default=is.null(subgroups), i.e. FALSE). If plot.na passed a numeric value below 0, the NA symbols are plotted below the cells. Otherwise if values >=0, they’re plotted above the cells. Note that this argument is FALSE if 'subgroups' is NULL |
ggplot2 object
Plotting function for cluster labels, names contain cell names. Used primarily in embeddingPlot().
embeddingGroupPlot( plot.df, groups, geom_point_w, min.cluster.size, mark.groups, font.size, legend.title, shuffle.colors, palette, plot.na, ... )
embeddingGroupPlot( plot.df, groups, geom_point_w, min.cluster.size, mark.groups, font.size, legend.title, shuffle.colors, palette, plot.na, ... )
plot.df |
data.frame for plotting. In embeddingPlot(), this is a tibble from tibble::rownames_to_column(). |
groups |
vector of cluster labels, names contain cell names (default=NULL) |
geom_point_w |
function to work with geom_point layer from ggplot2 (default=ggplot2::geom_point) |
min.cluster.size |
labels for all groups with number of cells fewer than this parameter are considered as missed (default=0). This argument is ignored if groups aren't provided |
mark.groups |
plot cluster labels above points (default=TRUE) |
font.size |
font size for cluster labels (default=c(3, 7)). It can either be single number for constant font size or pair (min, max) for font size depending on cluster size |
legend.title |
legend title (default=NULL) |
shuffle.colors |
shuffle colors (default=FALSE) |
palette |
vector or list or function (default=NULL). Accepts number of colors and return list of colors (i.e. see 'colorRampPalette') (default=NULL) |
plot.na |
boolean/numeric Whether to plot points, for which groups / colors are missed (default=is.null(subgroups), i.e. FALSE). If plot.na passed a numeric value below 0, the NA symbols are plotted below the cells. Otherwise if values >=0, they’re plotted above the cells. Note that this argument is FALSE if 'subgroups' is NULL |
... |
Additional arguments passed to ggplot2::geom_label_repel() |
ggplot2 object
Plot embedding with provided labels / colors using ggplot2
embeddingPlot( embedding, groups = NULL, colors = NULL, subgroups = NULL, plot.na = is.null(subgroups), min.cluster.size = 0, mark.groups = TRUE, show.legend = FALSE, alpha = 0.4, size = 0.8, title = NULL, plot.theme = NULL, palette = NULL, color.range = "symmetric", font.size = c(3, 7), show.ticks = FALSE, show.labels = FALSE, legend.position = NULL, legend.title = NULL, gradient.range.quantile = 1, raster = FALSE, raster.dpi = 300, shuffle.colors = FALSE, keep.limits = !is.null(subgroups), ... )
embeddingPlot( embedding, groups = NULL, colors = NULL, subgroups = NULL, plot.na = is.null(subgroups), min.cluster.size = 0, mark.groups = TRUE, show.legend = FALSE, alpha = 0.4, size = 0.8, title = NULL, plot.theme = NULL, palette = NULL, color.range = "symmetric", font.size = c(3, 7), show.ticks = FALSE, show.labels = FALSE, legend.position = NULL, legend.title = NULL, gradient.range.quantile = 1, raster = FALSE, raster.dpi = 300, shuffle.colors = FALSE, keep.limits = !is.null(subgroups), ... )
embedding |
two-column matrix with x and y coordinates of the embedding, rownames contain cell names and are used to match coordinates with groups or colors |
groups |
vector of cluster labels, names contain cell names (default=NULL) |
colors |
vector of numbers, which must be shown with point colors, names contain cell names (default=NULL). This argument is ignored if groups are provided. |
subgroups |
subset of 'groups', selecting the cells for plot (default=NULL). Ignored if 'groups' is NULL |
plot.na |
boolean/numeric Whether to plot points, for which groups / colors are missed (default=is.null(subgroups), i.e. FALSE). If plot.na passed a numeric value below 0, the NA symbols are plotted below the cells. Otherwise if values >=0, they’re plotted above the cells. Note that this argument is FALSE if 'subgroups' is NULL |
min.cluster.size |
labels for all groups with number of cells fewer than this parameter are considered as missed (default=0). This argument is ignored if groups aren't provided |
mark.groups |
plot cluster labels above points (default=TRUE) |
show.legend |
show legend (default=FALSE) |
alpha |
opacity level [0, 1] (default=0.4) |
size |
point size (default=0.8) |
title |
plot title (default=NULL) |
plot.theme |
theme for the plot (default=NULL) |
palette |
vector or list or function (default=NULL). Accepts number of colors and return list of colors (i.e. see 'colorRampPalette') (default=NULL) |
color.range |
controls range, in which colors are estimated (default="symmetric"). Pass "all" to estimate range based on all values of "colors", pass "data" to estimate it only based on colors, presented in the embedding. Alternatively you can pass vector of length 2 with (min, max) values. |
font.size |
font size for cluster labels (default=c(3, 7)). It can either be single number for constant font size or pair (min, max) for font size depending on cluster size |
show.ticks |
show ticks and tick labels (default=FALSE) |
show.labels |
show labels (default=FALSE) |
legend.position |
vector with (x, y) positions of the legend (default=NULL) |
legend.title |
legend title (default=NULL) |
gradient.range.quantile |
Winsorization quantile for the numeric colors and gene gradient (default=1) |
raster |
boolean whether layer with the points be rasterized (default=FALSE). Setting of this argument to TRUE is useful when you need to export a plot with large number of points |
raster.dpi |
dpi of the rasterized plot. (default=300). Ignored if raster == FALSE. |
shuffle.colors |
shuffle colors (default=FALSE) |
keep.limits |
Keep axis limits from original plot (default=!is.null(subgroups)). Useful when plotting subgroups, only meaningful it plot.na=FALSE |
... |
Arguments passed on to
|
ggplot2 object
library(sccore) embeddingPlot(umapEmbedding, show.ticks=TRUE, show.labels=TRUE, title="UMAP embedding")
library(sccore) embeddingPlot(umapEmbedding, show.ticks=TRUE, show.labels=TRUE, title="UMAP embedding")
Embed a graph into a UMAP, Uniform Manifold Approximation and Projection for Dimension Reduction, <https://github.com/lmcinnes/umap>, <doi:10.21105/joss.00861>
embedGraphUmap( graph, min.prob = 0.001, min.visited.verts = 1000, n.cores = 1, max.hitting.nn.num = 0, max.commute.nn.num = 0, min.prob.lower = 1e-07, n.neighbors = 40, n.epochs = 1000, spread = 15, min.dist = 0.001, return.all = FALSE, n.sgd.cores = n.cores, verbose = TRUE, ... )
embedGraphUmap( graph, min.prob = 0.001, min.visited.verts = 1000, n.cores = 1, max.hitting.nn.num = 0, max.commute.nn.num = 0, min.prob.lower = 1e-07, n.neighbors = 40, n.epochs = 1000, spread = 15, min.dist = 0.001, return.all = FALSE, n.sgd.cores = n.cores, verbose = TRUE, ... )
graph |
input igraph object |
min.prob |
numeric Minimum probability for proximity when calculating hitting time per neighbors (default=1e-3) |
min.visited.verts |
numeric Minimum number of vertices visted when calculating hitting time per neighbors (default=1000) |
n.cores |
numeric Number of cores to use (default=1) |
max.hitting.nn.num |
numeric Maximum adjacencies for calculating hitting time per neighbor, hitting_time_per_neighbors() (default=0) |
max.commute.nn.num |
numeric Maximum adjacencies for calculating commute time per neighbor, commute_time_per_node() (default=0) |
min.prob.lower |
numeric Probability threshold to continue iteration in depth first search hitting time, dfs_hitting_time() (default=1e-7) |
n.neighbors |
numeric Number of neighbors (default=40) |
n.epochs |
numeric Number of epochs to use during the optimization of the embedded coordinates (default=1000). See 'n_epochs' in uwot::umap() |
spread |
numeric The effective scale of embedded points (numeric default=15). See 'spread' in uwot::umap() |
min.dist |
numeric The effective minimum distance between embedded points (default=0.001). See 'min.dist' in uwot::umap() |
return.all |
boolean If TRUE, return list(adj.info=adj.info, commute.times=commute.times, umap=umap). Otherwise, just return UMAP(default=FALSE) |
n.sgd.cores |
numeric Number of cores to use during stochastic gradient descent. If set to > 1, then results will not be reproducible, even if 'set.seed' is called with a fixed seed before running (default=n_threads) See 'n_sgd_threads' in uwot::umap() |
verbose |
boolean Verbose output (default=TRUE) |
... |
Additional arguments passed to embedKnnGraph() |
resulting UMAP embedding
Embed a k-nearest neighbor (kNN) graph within a UMAP. Used within embedGraphUmap(). Please see McInnes et al <doi:10.21105/joss.00861> for the UMAP description and implementation.
embedKnnGraph( commute.times, n.neighbors, names = NULL, n.cores = 1, n.epochs = 1000, spread = 15, min.dist = 0.001, n.sgd.cores = n.cores, target.dims = 2, verbose = TRUE, ... )
embedKnnGraph( commute.times, n.neighbors, names = NULL, n.cores = 1, n.epochs = 1000, spread = 15, min.dist = 0.001, n.sgd.cores = n.cores, target.dims = 2, verbose = TRUE, ... )
commute.times |
graph commute times from get_nearest_neighbors(). The definition of commute_time(u, v) is the expected time starting at u = to reach v and then return to u . |
n.neighbors |
numeric Number of neighbors |
names |
vector of names for UMAP rownames (default=NULL) |
n.cores |
numeric Number of cores to use (except during stochastic gradient descent) (default=1). See 'n_threads' in uwot::umap() |
n.epochs |
numeric Number of epochs to use during the optimization of the embedded coordinates (default=1000). See 'n_epochs' in uwot::umap() |
spread |
numeric The effective scale of embedded points (numeric default=15). See 'spread' in uwot::umap() |
min.dist |
numeric The effective minimum distance between embedded points (default=0.001). See 'min.dist' in uwot::umap() |
n.sgd.cores |
numeric Number of cores to use during stochastic gradient descent. If set to > 1, then results will not be reproducible, even if 'set.seed' is called with a fixed seed before running (default=n.cores) See 'n_sgd_threads' in uwot::umap() |
target.dims |
numeric Dimensions for 'n_components' in uwot::umap(n_components=target.dims) (default=2) |
verbose |
boolean Verbose output (default=TRUE) |
... |
arguments passed to uwot::umap() |
resulting kNN graph embedding within a UMAP
Extend matrix to include new columns in matrix
extendMatrix(mtx, col.names)
extendMatrix(mtx, col.names)
mtx |
Matrix |
col.names |
Columns that should be included in matrix |
Matrix with new columns but rows retained
library(dplyr) gene.union <- lapply(conosClusterList, colnames) %>% Reduce(union, .) extendMatrix(conosClusterList[[1]], col.names=gene.union)
library(dplyr) gene.union <- lapply(conosClusterList, colnames) %>% Reduce(union, .) extendMatrix(conosClusterList[[1]], col.names=gene.union)
Utility function to translate a factor into colors
fac2col( x, s = 1, v = 1, shuffle = FALSE, min.group.size = 1, return.details = FALSE, unclassified.cell.color = "gray50", level.colors = NULL )
fac2col( x, s = 1, v = 1, shuffle = FALSE, min.group.size = 1, return.details = FALSE, unclassified.cell.color = "gray50", level.colors = NULL )
x |
input factor |
s |
numeric The "saturation" to be used to complete the HSV color descriptions (default=1) See ?rainbow in Palettes, grDevices |
v |
numeric The "value" to be used to complete the HSV color descriptions (default=1) See ?rainbow in Palettes, grDevices |
shuffle |
boolean If TRUE, shuffles columns with shuffle(columns) (default=FALSE) |
min.group.size |
integer Exclude groups of size less than the min.group.size (default=1) |
return.details |
boolean If TRUE, returns a list list(colors=y, palette=col). Otherwise, just returns the factor (default=FALSE) |
unclassified.cell.color |
Color for unclassified cells (default='gray50') |
level.colors |
(default=NULL) |
vector or list of colors
genes = factor(c("BRAF", "NPC1", "PAX3", "BRCA2", "FMR1")) fac2col(genes)
genes = factor(c("BRAF", "NPC1", "PAX3", "BRCA2", "FMR1")) fac2col(genes)
Encodes logic of how to handle named-vector and functional palettes. Used primarily within embeddingGroupPlot()
fac2palette(groups, palette, unclassified.cell.color = "gray50")
fac2palette(groups, palette, unclassified.cell.color = "gray50")
groups |
vector of cluster labels, names contain cell names |
palette |
vector or list or function (default=NULL). Accepts number of colors and return list of colors (i.e. see 'colorRampPalette') |
unclassified.cell.color |
Color for unclassified cells (default='gray50') |
vector or palette
Get nearest neighbors method on graph
get_nearest_neighbors( adjacency_list, transition_probabilities, n_verts = 0L, n_cores = 1L, min_prob = 0.001, min_visited_verts = 1000L, min_prob_lower = 1e-05, max_hitting_nn_num = 0L, max_commute_nn_num = 0L, verbose = TRUE )
get_nearest_neighbors( adjacency_list, transition_probabilities, n_verts = 0L, n_cores = 1L, min_prob = 0.001, min_visited_verts = 1000L, min_prob_lower = 1e-05, max_hitting_nn_num = 0L, max_commute_nn_num = 0L, verbose = TRUE )
adjacency_list |
igraph adjacency list |
transition_probabilities |
vector of transition probabilites |
n_verts |
numeric Number of vertices (default=0) |
n_cores |
numeric Number of cores to use (default=1) |
min_prob |
numeric Minimum probability for proximity when calculating hitting time per neighbors (default=1e-3) |
min_visited_verts |
numeric Minimum number of vertices visted when calculating hitting time per neighbors (default=1000) |
min_prob_lower |
numeric Probability threshold to continue iteration in depth first search hitting time, dfs_hitting_time() (default=1e-5) |
max_hitting_nn_num |
numeric Maximum adjacencies for calculating hitting time per neighbor, hitting_time_per_neighbors() (default=0) |
max_commute_nn_num |
numeric Maximum adjacencies for calculating commute time per neighbor, commute_time_per_node() (default=0) |
verbose |
boolean Whether to have verbose output (default=TRUE) |
list of commute times based on adjacencies
Collapse vertices belonging to each cluster in a graph
getClusterGraph( graph, groups, method = "sum", plot = FALSE, node.scale = 50, edge.scale = 50, edge.alpha = 0.3, seed = 1, ... )
getClusterGraph( graph, groups, method = "sum", plot = FALSE, node.scale = 50, edge.scale = 50, edge.alpha = 0.3, seed = 1, ... )
graph |
igraph graph object Graph to be collapsed |
groups |
factor on vertices describing cluster assignment (can specify integer vertex ids, or character vertex names which will be matched) |
method |
string Method to be used, either "sum" or "paga" (default="sum") |
plot |
boolean Whether to show collapsed graph plot (default=FALSE) |
node.scale |
numeric Scaling to control value of 'vertex.size' in plot.igraph() (default=50) |
edge.scale |
numeric Scaling to control value of 'edge.width' in plot.igraph() (default=50) |
edge.alpha |
numeric Scaling to control value of 'alpha.f' in adjustcolor() within plot.igraph() (default=0.3) |
seed |
numeric Set seed via set.seed() for plotting (default=1) |
... |
arguments passed to collapseGraphSum() |
collapsed graph
cluster.graph = getClusterGraph(conosGraph, igraph::V(conosGraph))
cluster.graph = getClusterGraph(conosGraph, igraph::V(conosGraph))
Convert igraph graph into an adjacency list
graphToAdjList(graph)
graphToAdjList(graph)
graph |
input igraph object |
adjacency list, defined by list(idx=adj.list, probabilities=probs, names=edge.list.fact$levels
library(dplyr) edge.list.fact <- igraph::as_edgelist(conosGraph) %>% as_factor() edge.list <- matrix(edge.list.fact$values, ncol=2) n.nodes <- length(igraph::V(conosGraph)) splitVectorByNodes(edge.list[,1], edge.list[,2], n.nodes)
library(dplyr) edge.list.fact <- igraph::as_edgelist(conosGraph) %>% as_factor() edge.list <- matrix(edge.list.fact$values, ncol=2) n.nodes <- length(igraph::V(conosGraph)) splitVectorByNodes(edge.list[,1], edge.list[,2], n.nodes)
Graph filter with the heat kernel:
heatFilter(x, l.max, order = 1, offset = 0, beta = 30)
heatFilter(x, l.max, order = 1, offset = 0, beta = 30)
x |
numeric Values to be filtered. Normally, these are graph laplacian engenvalues. |
l.max |
numeric Maximum eigenvalue on the graph ( |
order |
numeric Parameter |
offset |
numeric Mean kernel value ( |
beta |
numeric Parameter |
smoothed values for 'x'
Other graph smoothing:
computeChebyshevCoeffs()
,
smoothChebyshev()
,
smoothSignalOnGraph()
Jensen–Shannon distance metric (i.e. the square root of the Jensen–Shannon divergence) between the columns of a dense matrix m
jsDist(m)
jsDist(m)
m |
Input matrix |
Vectorized version of the lower triangle as an R distance object, stats::dist()
ex = matrix(1:9, nrow = 3, ncol = 3) jsDist(ex)
ex = matrix(1:9, nrow = 3, ncol = 3) jsDist(ex)
Merge list of count matrices into a common matrix, entering 0s for the missing entries
mergeCountMatrices(cms, transposed = FALSE, ...)
mergeCountMatrices(cms, transposed = FALSE, ...)
cms |
List of count matrices |
transposed |
boolean Indicate whether 'cms' is transposed, e.g. cells in rows and genes in columns (default=FALSE) |
... |
Parameters for 'plapply' function |
A merged extended matrix, with 0s for missing entries
mergeCountMatrices(conosClusterList, n.cores=1) ## 12 x 67388 sparse Matrix of class "dgCMatrix"
mergeCountMatrices(conosClusterList, n.cores=1) ## 12 x 67388 sparse Matrix of class "dgCMatrix"
Translate multilevel segmentation into a dendrogram, with the lowest level of the dendrogram listing the cells
multi2dend(cl, counts, deep = FALSE, dist = "cor")
multi2dend(cl, counts, deep = FALSE, dist = "cor")
cl |
igraph communities object, returned from igraph community detection functions |
counts |
dgCmatrix of counts |
deep |
boolean If TRUE, take (cl$memberships[1,]). Otherwise, uses as.integer(membership(cl)) (default=FALSE) |
dist |
Distance metric used (default='cor'). Eiether 'cor' for the correlation distance in log10 space, or 'JS' for the Jensen–Shannon distance metric (i.e. the square root of the Jensen–Shannon divergence) |
resulting dendrogram
Parallel, optionally verbose lapply. See ?parallel::mclapply for more info.
plapply( ..., progress = FALSE, n.cores = parallel::detectCores(), mc.preschedule = FALSE, mc.allow.recursive = TRUE, fail.on.error = FALSE )
plapply( ..., progress = FALSE, n.cores = parallel::detectCores(), mc.preschedule = FALSE, mc.allow.recursive = TRUE, fail.on.error = FALSE )
... |
Additional arguments passed to mclapply(), lapply(), or pbmcapply::pbmclapply() |
progress |
Show progress bar via pbmcapply::pbmclapply() (default=FALSE). |
n.cores |
Number of cores to use (default=parallel::detectCores()). When n.cores=1, regular lapply() is used. Note: doesn't work when progress=TRUE |
mc.preschedule |
if set to |
mc.allow.recursive |
boolean Unless true, calling mclapply in a child process will use the child and not fork again (default=TRUE) |
fail.on.error |
boolean Whether to fail and report and error (using stop()) as long as any of the individual tasks has failed (default =FALSE) |
list, as returned by lapply
square = function(x){ x**2 } plapply(1:10, square, n.cores=1, progress=TRUE)
square = function(x){ x**2 } plapply(1:10, square, n.cores=1, progress=TRUE)
Label propagation
propagate_labels( edge_verts, edge_weights, vert_labels, max_n_iters = 10L, verbose = TRUE, diffusion_fading = 10, diffusion_fading_const = 0.5, tol = 0.005, fixed_initial_labels = FALSE )
propagate_labels( edge_verts, edge_weights, vert_labels, max_n_iters = 10L, verbose = TRUE, diffusion_fading = 10, diffusion_fading_const = 0.5, tol = 0.005, fixed_initial_labels = FALSE )
edge_verts |
edge vertices of igraph graph object |
edge_weights |
edge weights of igraph graph object |
vert_labels |
vector of factor or character labels, named by cell names |
max_n_iters |
integer Maximal number of iterations (default=10) |
verbose |
boolean Verbose mode (default=TRUE) |
diffusion_fading |
numeric Constant used for diffusion on the graph, exp(-diffusion.fading * (edge_length + diffusion.fading.const)) (default=10.0) |
diffusion_fading_const |
numeric Another constant used for diffusion on the graph, exp(-diffusion.fading * (edge_length + diffusion.fading.const)) (default=0.5) |
tol |
numeric Absolute tolerance as a stopping criteria (default=5e-3) |
fixed_initial_labels |
boolean Prohibit changes of initial labels during diffusion (default=FALSE) |
matrix from input graph, with labels propagated
Estimate labeling distribution for each vertex, based on provided labels.
propagateLabels(graph, labels, method = "diffusion", ...)
propagateLabels(graph, labels, method = "diffusion", ...)
graph |
igraph graph object |
labels |
vector of factor or character labels, named by cell names, used in propagateLabelsSolver() and propagateLabelsDiffusion() |
method |
string Type of propagation. Either 'diffusion' or 'solver'. (default='diffusion') 'solver' gives better result but has bad asymptotics, so it is inappropriate for datasets > 20k cells. |
... |
additional arguments passed to either propagateLabelsSolver() or propagateLabelsDiffusion() |
matrix with distribution of label probabilities for each vertex by rows.
propagateLabels(conosGraph, labels=cellAnnotations)
propagateLabels(conosGraph, labels=cellAnnotations)
Estimate labeling distribution for each vertex, based on provided labels using a Random Walk on graph
propagateLabelsDiffusion( graph, labels, max.iters = 100, diffusion.fading = 10, diffusion.fading.const = 0.1, tol = 0.025, fixed.initial.labels = TRUE, verbose = TRUE )
propagateLabelsDiffusion( graph, labels, max.iters = 100, diffusion.fading = 10, diffusion.fading.const = 0.1, tol = 0.025, fixed.initial.labels = TRUE, verbose = TRUE )
graph |
igraph graph object Graph input |
labels |
vector of factor or character labels, named by cell names |
max.iters |
integer Maximal number of iterations (default=100) |
diffusion.fading |
numeric Constant used for diffusion on the graph, exp(-diffusion.fading * (edge_length + diffusion.fading.const)) (default=10.0) |
diffusion.fading.const |
numeric Another constant used for diffusion on the graph, exp(-diffusion.fading * (edge_length + diffusion.fading.const)) (default=0.1) |
tol |
numeric Absolute tolerance as a stopping criteria (default=0.025) |
fixed.initial.labels |
boolean Prohibit changes of initial labels during diffusion (default=TRUE) |
verbose |
boolean Verbose mode (default=TRUE) |
matrix from input graph, with labels propagated
propagateLabelsDiffusion(conosGraph, labels=cellAnnotations)
propagateLabelsDiffusion(conosGraph, labels=cellAnnotations)
Propagate labels using Zhu, Ghahramani, Lafferty (2003) algorithm, "Semi-Supervised Learning Using Gaussian Fields and Harmonic Functions" <http://mlg.eng.cam.ac.uk/zoubin/papers/zgl.pdf>
propagateLabelsSolver(graph, labels, solver = "mumps")
propagateLabelsSolver(graph, labels, solver = "mumps")
graph |
igraph graph object Graph input |
labels |
vector of factor or character labels, named by cell names |
solver |
Method of solver to use (default="mumps"), either "Matrix" or "mumps" (i.e. "rmumps::Rmumps") |
result from Matrix::solve() or rmumps::Rmumps
propagateLabelsSolver(conosGraph, labels=cellAnnotations)
propagateLabelsSolver(conosGraph, labels=cellAnnotations)
Save DE results as JSON tables for viewing in browser
saveDeAsJson( de.raw, sample.groups = NULL, saveprefix = NULL, dir.name = "JSON", gene.metadata = NULL, verbose = TRUE )
saveDeAsJson( de.raw, sample.groups = NULL, saveprefix = NULL, dir.name = "JSON", gene.metadata = NULL, verbose = TRUE )
de.raw |
List of DE results from e.g. cacoa, conos |
sample.groups |
Sample groups as named list, each element containing a vector of samples. Can be retrieved from e.g. package cacoa (default=NULL) |
saveprefix |
Prefix for created files (default=NULL) |
dir.name |
Name for directory with results. If it doesn't exist, it will be created. To disable, set as NULL (default="JSON") |
gene.metadata |
(default=NULL) # Needs explanation |
verbose |
Show progress (default=TRUE) |
JSON files, table of content, and viewer files for viewing DE results in browser
## Not run: saveDeAsJson(de.raw, sample.groups) ## End(Not run) ## The results can be viewed in a webbrowser by opening toc.html
## Not run: saveDeAsJson(de.raw, sample.groups) ## End(Not run) ## The results can be viewed in a webbrowser by opening toc.html
Set range for values in object. Changes values outside of range to min or max. Adapted from Seurat::MinMax
setMinMax(obj, min, max)
setMinMax(obj, min, max)
obj |
Object to manipulate |
min |
Minimum value |
max |
Maximum value |
An object with the same dimensions as input but with altered range in values
example_matrix = matrix(rep(c(1:5), 3), 5) setMinMax(example_matrix, 2, 4)
example_matrix = matrix(rep(c(1:5), 3), 5) setMinMax(example_matrix, 2, 4)
Smooth gene expression, used primarily within conos::correctGenes. Used to smooth gene expression values in order to better represent the graph structure. Use diffusion of expression on graph with the equation dv = exp(-a * (v + b))
smooth_count_matrix( edge_verts, edge_weights, count_matrix, is_label_fixed, max_n_iters = 10L, diffusion_fading = 1, diffusion_fading_const = 0.1, tol = 0.001, verbose = TRUE, normalize = FALSE )
smooth_count_matrix( edge_verts, edge_weights, count_matrix, is_label_fixed, max_n_iters = 10L, diffusion_fading = 1, diffusion_fading_const = 0.1, tol = 0.001, verbose = TRUE, normalize = FALSE )
edge_verts |
edge vertices of igraph graph object |
edge_weights |
edge weights of igraph graph object |
count_matrix |
gene count matrix |
is_label_fixed |
boolean Whether label is fixed |
max_n_iters |
integer Maximal number of iterations (default=10) |
diffusion_fading |
numeric Constant used for diffusion on the graph, exp(-diffusion.fading * (edge_length + diffusion.fading.const)) (default=1.0) |
diffusion_fading_const |
numeric Another constant used for diffusion on the graph, exp(-diffusion.fading * (edge_length + diffusion.fading.const)) (default=0.1) |
tol |
numeric Absolute tolerance as a stopping criteria (default=1e-3) |
verbose |
boolean Verbose mode (default=TRUE) |
normalize |
boolean Whether to normalize values (default=FALSE) |
matrix from input graph, with labels propagated
Smooth Signal on Graph
smoothSignalOnGraph( signal, filter, graph = NULL, lap = NULL, l.max = NULL, m = 50, ... )
smoothSignalOnGraph( signal, filter, graph = NULL, lap = NULL, l.max = NULL, m = 50, ... )
signal |
signal to be smoothed |
filter |
function that accepts signal 'x' and the maximal Laplacian eigenvalue 'l.max'. See |
graph |
igraph object with the graph (default=NULL) |
lap |
graph laplacian (default=NULL). If NULL, 'lap' estimated from graph. |
l.max |
maximal eigenvalue of 'lap' (default=NULL). If NULL, estimated from 'lap'. |
m |
numeric Maximum order of Chebyshev coeff to compute (default=50) |
... |
Arguments passed on to
|
Other graph smoothing:
computeChebyshevCoeffs()
,
heatFilter()
,
smoothChebyshev()
Set names equal to values, a stats::setNames wrapper function
sn(x)
sn(x)
x |
an object for which names attribute will be meaningful |
An object with names assigned equal to values
vec = c(1, 2, 3, 4) sn(vec)
vec = c(1, 2, 3, 4) sn(vec)
splitVectorByNodes
splitVectorByNodes(vec, nodes, n.nodes)
splitVectorByNodes(vec, nodes, n.nodes)
vec |
input vector to be divided |
nodes |
nodes used to divide the vector 'vec' via split() |
n.nodes |
numeric The number of nodes for splitting |
list from vec with names given by the nodes
adjList = graphToAdjList(conosGraph) print(names(adjList)) ## [1] "idx" "probabilities" "names" length(adjList$names) ## [1] 12000
adjList = graphToAdjList(conosGraph) print(names(adjList)) ## [1] "idx" "probabilities" "names" length(adjList$names) ## [1] 12000
Set plot.theme, legend, ticks for embedding plot. Used primarily in embeddingPlot().
styleEmbeddingPlot( gg, plot.theme = NULL, title = NULL, legend.position = NULL, show.legend = TRUE, show.ticks = TRUE, show.labels = TRUE, relabel.axis = TRUE )
styleEmbeddingPlot( gg, plot.theme = NULL, title = NULL, legend.position = NULL, show.legend = TRUE, show.ticks = TRUE, show.labels = TRUE, relabel.axis = TRUE )
gg |
ggplot2 object to plot |
plot.theme |
theme for the plot (default=NULL) |
title |
plot title (default=NULL) |
legend.position |
vector with (x, y) positions of the legend (default=NULL) |
show.legend |
show legend (default=TRUE) |
show.ticks |
show ticks and tick labels (default=TRUE) |
show.labels |
show labels (default=TRUE) |
relabel.axis |
boolean If TRUE, relabel axes with ggplot2::labs(x='Component 1', y='Component 2') (default=TRUE) |
ggplot2 object
UMAP embedding
umapEmbedding
umapEmbedding
An object of class matrix
(inherits from array
) with 12000 rows and 2 columns.
Utility function to translate values into colors.
val2col(x, gradientPalette = NULL, zlim = NULL, gradient.range.quantile = 0.95)
val2col(x, gradientPalette = NULL, zlim = NULL, gradient.range.quantile = 0.95)
x |
input values |
gradientPalette |
gradient palette (default=NULL). If NULL, use colorRampPalette(c('gray90','red'), space = "Lab")(1024) if the values are non-negative; otherwise colorRampPalette(c("blue", "grey90", "red"), space = "Lab")(1024) is used |
zlim |
a two-value vector specifying limits of the values that should correspond to the extremes of the color gradient |
gradient.range.quantile |
extreme quantiles of values that should be trimmed prior to color mapping (default=0.95) |
colors <- val2col( rnorm(10) )
colors <- val2col( rnorm(10) )
Helper function to return a ggplot color gradient for a numeric vector ggplot(aes(color=x, ...), ...) + val2ggcol(x)
val2ggcol( values, gradient.range.quantile = 1, color.range = "symmetric", palette = NULL, midpoint = NULL, oob = scales::squish, return.fill = FALSE, ... )
val2ggcol( values, gradient.range.quantile = 1, color.range = "symmetric", palette = NULL, midpoint = NULL, oob = scales::squish, return.fill = FALSE, ... )
values |
values by which the color gradient is determined |
gradient.range.quantile |
numeric Trimming quantile (default=1). Either a single number or two numbers - for lower and upper quantile. |
color.range |
either a vector of two values explicitly specifying the values corresponding to the start/end of the gradient, or string "symmetric" or "all" (default="symmetric"). "symmetric": range will fit data, but will be symmetrized around zeros, "all": gradient will match the span of the range of the data (after gradient.range.quantile) |
palette |
an optional palette (default=NULL). The default becomes blue-gray90-red; if the values do not straddle 0, then truncated gradients (blue-gray90 or gray90-red) will be used |
midpoint |
optional midpoint (default=NULL). Set for the center of the resulting range by default |
oob |
function to determine what to do with the values outside of the range (default =scales::squish). Refer to 'oob' parameter in ggplot |
return.fill |
boolean Whether to return fill gradients instead of color (default=FALSE) |
... |
additional arguments are passed to ggplot2::scale_color_gradient* functions, i.e. scale_color_gradient(), scale_color_gradient2(), scale_color_gradientn() |
ggplot2::scale_colour_gradient object