---
title: "Converting between single-cell formats with lstar"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Converting between single-cell formats with lstar}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

lstar is a lightweight **interchange** layer for single-cell and spatial omics. A dataset is a set of
**axes** (labelled sets you index by --- `cells`, `genes`, `pca`) and **fields** (typed data over a tuple
of axes --- counts, embeddings, graphs, labels), serialized to a portable **Zarr** store that R, Python,
and C++ all read and write. Format conversion is then just `write_Y(read_X(obj))` with the L\* store as the
universal intermediate, and what a target cannot hold is recorded in `ds$dropped` rather than silently
lost.

## The model in R, end to end

Everything below runs with only the base dependencies (`Matrix`); no Seurat/SCE needed.

```{r}
library(lstar)

cells <- paste0("c", 1:6); genes <- paste0("g", 1:4)
m <- as(matrix(as.numeric(1:24), 6, 4, dimnames = list(cells, genes)), "CsparseMatrix")  # cells x genes

ds <- list(
  kind = "sample",
  axes = list(
    cells = list(labels = cells, origin = "observed", role = "observation"),
    genes = list(labels = genes, origin = "observed", role = "feature")),
  fields = list(
    counts = list(role = "measure", span = c("cells", "genes"), state = "raw", values = m),
    cluster = list(role = "label", span = "cells", values = factor(c("a", "a", "b", "b", "a", "b")))))
class(ds) <- "lstar_dataset"

p <- tempfile(fileext = ".lstar.zarr")
lstar_write(ds, p)            # -> a portable Zarr store (also readable from Python and C++)
ds2 <- lstar_read(p)
ds2
```

A categorical `label` over `cells` induces a **factor axis** whose labels are its categories, so
independent per-group results align on one axis.

## Converting to and from Seurat / SingleCellExperiment

The profiles map the shared-vocabulary core --- counts, normalized/scaled expression, PCA (scores **and**
gene loadings), UMAP/t-SNE, clusterings, cell/gene metadata --- between formats. (Not evaluated here, to
keep the vignette dependency-free.)

```{r, eval = FALSE}
so  <- write_seurat(ds)          # L* dataset  -> Seurat object
ds3 <- read_seurat(so)           # Seurat       -> L* dataset
sce <- write_sce(read_seurat(so))   # Seurat -> SingleCellExperiment, in one line
```

Cross-language conversions go through the on-disk store --- write it on one side, read it on the other,
no shared memory and no format re-implementation:

```{r, eval = FALSE}
# Python:  lstar.write(read_anndata(ad.read_h5ad("pbmc.h5ad")), "pbmc.lstar.zarr")
ds_from_h5ad <- lstar_read("pbmc.lstar.zarr")
saveRDS(write_seurat(ds_from_h5ad), "pbmc.rds")
```

## The `lstar convert` command line

The Python package ships a one-command CLI that detects formats by path, bridges R and Python through the
store automatically, and reports what crossed (and what was `dropped`):

```sh
lstar convert pbmc.h5ad pbmc.rds --report        # AnnData -> Seurat, with a fidelity report
lstar convert pbmc.rds  pbmc.h5ad --check        # + open the result in its native library and smoke-test it
```

`--backend auto|native|direct` adds a **package-free fallback**: `.h5ad` converts with only `h5py` (no
anndata), and a Seurat `.rds` reads *and* writes with base R + this package (no SeuratObject); an SCE
`.rds` *reads* package-free. See `vignette` topics and the package website for the full conversion matrix.
```