Spindle

Installation

Create and activate a virtual environment, then install runtime dependencies and (optionally) the docs/tools:

python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
# To build the documentation locally:
pip install sphinx sphinx-rtd-theme myst-parser
# Optional (for automatic notebook → HTML conversion in CI):
pip install nbconvert jupyter

To build the Sphinx site locally:

python -m sphinx -b html docs_src docs/_build/html -a

Background

Sprindle is a library for indexing and searching symmetric positive definite (SPD) sub-matrices derived from spatial-omics datasets. Please refer to the documentation page.

The core idea is to build a block-structured DAG index over SPD matrices (or their correlation equivalents) and then perform budget-pruned search over this graph for fast matching of query sub-matrices.

Modules

  • src/spindle_dev/preprocessing.py – interfaces only; define how raw spatial data is converted into a SpatialDataset with points and SPD matrices.

  • src/spindle_dev/index.py – builds a DAG index over block clusters of SPD matrices according to the spec in .github/copilot-instructions.md.

  • src/spindle_dev/search.py – traverses the index given a query SPD sub-matrix and a distance budget, returning matching SPD IDs and paths.

  • src/spindle_dev/metrics.py – implements log-Euclidean distance and SPD ↔ correlation conversions.

  • src/spindle_dev/utils.py – serialization helpers (save_index / load_index), deterministic config, and logging utilities.

Public API

The intended “front door” for users and higher-level code is:

  • build_index(spatial_data, config) -> IndexHandle (from index.py)

  • query_index(index_handle, query_spd, budget, config) -> SearchResults (from search.py)

All other functions and data structures are considered internal implementation details.

Detailed: create_index steps

  • Entry point: create_index (defined in ISMB_notebook/spindle_xenium_single.py).

  • Load coordinates: extract coords = adata.obsm['spatial'] and build spatial tiles with preprocessing.build_quadtree_tiles(coords, ...).

  • Filter & reindex tiles: remove tiny tiles, then call preprocessing.reindex_tiles(tiles).

  • Select genes: choose top_genes (or all genes) via preprocessing.topvar_genes(adata, G=num_genes) producing genes_work, gene_idx.

  • Compute per-tile covariances: preprocessing.build_tile_covs_full_serial(adata, tiles, gene_idx, eps=1e-6) returns tile_covs used to create index.ProcessedData(tiles, tile_covs, genes_work, adata.n_obs).

  • Dimensionality reduction: if PCA/UMAP not present, call data.reduce_dim(num_pca_components=30, n_components=2, do_umap=True) to compute latent features and store pca_model.

  • Clustering: call data.cluster_spds(cluster_distance='tree', cluster_method='leiden', resolution=resolution) to assign data.labels and compute per-cluster consensus trees and permutations.

  • Assign spot labels: data.assign_label_to_spots() maps original spot indices to cluster labels.

  • Cluster means / correlation means: data.get_corr_mean_by_cluster() computes data.R_mean_list used for block detection.

  • Adaptive block detection: data.get_adaptive_runs(find_blocks=True, with_size_guard=True, min_final_size=min_final_size, max_final_size=100) returns candidate block runs.

  • Per-cluster epsilon selection: for each cluster call index.choose_adaptive_epsilons(data, cluster_id, k_target_per_block=64) to get eps_per_block, eps_elbow_per_block, eps and populate IndexConfig fields epsilon_dict and epsilon_block_wise_dict.

  • Index construction: call index.index_spds(data, config=config) to produce dag_dict, stat, dist_list (the block-DAG index).

  • Serialize index: index.save_index(data, dag_dict, index_path + '/spindle.pkl') writes the dataset bundle.

  • Sanity test & artifacts: run test.run_sanity_search(data, dag_dict, config, search_cfg, max_queries=max_queries), write index_stats.txt with timing, and save sanity_test_results.csv in the index folder.

  • What to inspect after building:

    • index_path/spindle.pkl — load with spindle_dev.index.load_index to get a DatasetIndex bundle containing dag_dict and IndexHandle objects.

    • index_path/index_stats.txt — index build time.

    • index_path/sanity_test_results.csv — sanity-check search records.