Spindle
Installation
Create and activate a virtual environment, then install runtime dependencies and (optionally) the docs/tools:
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
# To build the documentation locally:
pip install sphinx sphinx-rtd-theme myst-parser
# Optional (for automatic notebook → HTML conversion in CI):
pip install nbconvert jupyter
To build the Sphinx site locally:
python -m sphinx -b html docs_src docs/_build/html -a
Background
Sprindle is a library for indexing and searching symmetric positive definite (SPD) sub-matrices derived from spatial-omics datasets. Please refer to the documentation page.
The core idea is to build a block-structured DAG index over SPD matrices (or their correlation equivalents) and then perform budget-pruned search over this graph for fast matching of query sub-matrices.
Modules
src/spindle_dev/preprocessing.py – interfaces only; define how raw spatial data is converted into a SpatialDataset with points and SPD matrices.
src/spindle_dev/index.py – builds a DAG index over block clusters of SPD matrices according to the spec in .github/copilot-instructions.md.
src/spindle_dev/search.py – traverses the index given a query SPD sub-matrix and a distance budget, returning matching SPD IDs and paths.
src/spindle_dev/metrics.py – implements log-Euclidean distance and SPD ↔ correlation conversions.
src/spindle_dev/utils.py – serialization helpers (save_index / load_index), deterministic config, and logging utilities.
Public API
The intended “front door” for users and higher-level code is:
build_index(spatial_data, config) -> IndexHandle(fromindex.py)query_index(index_handle, query_spd, budget, config) -> SearchResults(fromsearch.py)
All other functions and data structures are considered internal implementation details.
Index And Search
Prerequisites: Ensure you run from the project root or add
srctoPYTHONPATHsospindle_devis importable. Install runtime deps (e.g.scanpy,numpy,pandas,scikit-learn,umap-learn).Quick CLI (recommended for single H5AD files): Use the helper script to build an index from an AnnData
.h5adfile and run a small sanity search. Example:
# from project root
python ISMB_notebook/spindle_xenium_single.py path/to/sample.h5ad \
--top-genes 800 --all-genes --max-queries 100
Programmatic: Create index from an
AnnDataobject
import scanpy as sc
from ISMB_notebook.spindle_xenium_single import create_index
# load your AnnData
adata = sc.read_h5ad('path/to/sample.h5ad')
index_path = 'path/to/sample_index'
# create_index will build the tiles, compute covariances, cluster, choose epsilons,
# save a serialized index at `<index_path>/spindle.pkl` and run a small sanity test.
create_index(adata, index_path, resolution=0.5, min_final_size=15,
top_genes=800, all_genes=True, max_queries=100)
Programmatic: Load saved index and run a search
import numpy as np
import spindle_dev
from spindle_dev import index as sd_index, search as sd_search
# load the saved DatasetIndex bundle
bundle = sd_index.load_index('path/to/sample_index/spindle.pkl')
# inspect available cluster ids (each maps to an IndexHandle)
print('cluster ids:', list(bundle.dag_dict.keys()))
# pick an IndexHandle (e.g. first key) and prepare a SPD query matrix
cluster_id = list(bundle.dag_dict.keys())[0]
index_handle = bundle.dag_dict[cluster_id]
# Replace the following with a real SPD query (e.g. a tile covariance or correlation)
query_spd = np.eye(bundle.pca_model.components_.shape[1]) # placeholder; use a real SPD
budget = 0.5
cfg = sd_search.SearchConfig(max_results=5)
results = sd_search.query_index(index_handle, query_spd, budget, config=cfg)
print(results)
Notes:
The script ISMB_notebook/spindle_xenium_single.py contains a convenient
create_indexwrapper that builds the index and runs a sanity check. Use it for quick experiments.Saved indexes are written with
spindle.pkland can be loaded withspindle_dev.index.load_index(returns aDatasetIndexbundle containingdag_dictwithIndexHandleobjects).sd_search.query_indexexpects anIndexHandle, a SPD matrix (numpy.ndarray) and a numericbudget.
Detailed: create_index steps
Entry point:
create_index(defined in ISMB_notebook/spindle_xenium_single.py).Load coordinates: extract
coords = adata.obsm['spatial']and build spatial tiles withpreprocessing.build_quadtree_tiles(coords, ...).Filter & reindex tiles: remove tiny tiles, then call
preprocessing.reindex_tiles(tiles).Select genes: choose
top_genes(or all genes) viapreprocessing.topvar_genes(adata, G=num_genes)producinggenes_work, gene_idx.Compute per-tile covariances:
preprocessing.build_tile_covs_full_serial(adata, tiles, gene_idx, eps=1e-6)returnstile_covsused to createindex.ProcessedData(tiles, tile_covs, genes_work, adata.n_obs).Dimensionality reduction: if PCA/UMAP not present, call
data.reduce_dim(num_pca_components=30, n_components=2, do_umap=True)to compute latent features and storepca_model.Clustering: call
data.cluster_spds(cluster_distance='tree', cluster_method='leiden', resolution=resolution)to assigndata.labelsand compute per-cluster consensus trees and permutations.Assign spot labels:
data.assign_label_to_spots()maps original spot indices to cluster labels.Cluster means / correlation means:
data.get_corr_mean_by_cluster()computesdata.R_mean_listused for block detection.Adaptive block detection:
data.get_adaptive_runs(find_blocks=True, with_size_guard=True, min_final_size=min_final_size, max_final_size=100)returns candidate block runs.Per-cluster epsilon selection: for each cluster call
index.choose_adaptive_epsilons(data, cluster_id, k_target_per_block=64)to geteps_per_block, eps_elbow_per_block, epsand populateIndexConfigfieldsepsilon_dictandepsilon_block_wise_dict.Index construction: call
index.index_spds(data, config=config)to producedag_dict, stat, dist_list(the block-DAG index).Serialize index:
index.save_index(data, dag_dict, index_path + '/spindle.pkl')writes the dataset bundle.Sanity test & artifacts: run
test.run_sanity_search(data, dag_dict, config, search_cfg, max_queries=max_queries), writeindex_stats.txtwith timing, and savesanity_test_results.csvin the index folder.What to inspect after building:
index_path/spindle.pkl— load withspindle_dev.index.load_indexto get aDatasetIndexbundle containingdag_dictandIndexHandleobjects.index_path/index_stats.txt— index build time.index_path/sanity_test_results.csv— sanity-check search records.