Proteomics Tutorial with SpatialData blobs#

This tutorial shows two equivalent ways to build a Vitessce proteomics config:

  1. proteomics_from_split_sources: image, labels, and table are passed as separate paths.

  2. proteomics_from_spatialdata: layers are resolved from a SpatialData object and can use coordinate systems.

Both workflows use the same underlying data so you can compare behavior directly.

%load_ext autoreload
%autoreload 2
import harpy_vitessce as hpv
import tempfile
from pathlib import Path

tmp_dir = Path(tempfile.mkdtemp(prefix="spatialdata_blobs"))
import scanpy as sc
from spatialdata.datasets import blobs
from spatialdata.models import TableModel

sdata = blobs()

adata = sdata["table"]

# add leiden clusters using a dummy scanpy pipeline
sc.pp.scale(adata, max_value=10)
sc.pp.pca(
    adata,
    n_comps=2,
    svd_solver="arpack",
)
sc.pp.neighbors(
    adata,
    use_rep="X_pca",
    n_neighbors=10,
)
sc.tl.leiden(adata, resolution=0.6, key_added="leiden")
sc.tl.umap(adata, min_dist=0.3)

# uncomment these to convince yourself that Vitessce (when using SpatialDataWrapper) falls back to the index of the table if there is not instance/region key in the table.
# del adata.obs["instance_id"]
# del adata.uns[TableModel.ATTRS_KEY]
# adata.obs.index = [f"segmentation_{uuid.uuid4()}" for _ in range(len(adata.obs))]

spatialdata_path = tmp_dir / "sdata.zarr"
sdata.write(
    spatialdata_path,
    overwrite=True,
)

Why index alignment matters for split sources#

proteomics_from_split_sources uses separate wrappers for image/labels/table. For cell-level linking, segmentation IDs in labels_source should match the AnnData observation IDs (adata.obs_names, i.e. the table index). If they do not match, selections in spatial and feature views cannot be synchronized correctly.

import dask.array as da

display(sdata["table"].obs.index)
# should match ID's in
display(da.unique(sdata["blobs_labels"].data).compute())
Index(['1', '2', '3', '4', '5', '6', '8', '9', '10', '11', '12', '13', '15',
       '16', '17', '18', '19', '20', '22', '23', '24', '25', '26', '27', '29',
       '30'],
      dtype='object')
array([ 0,  1,  2,  3,  4,  5,  6,  8,  9, 10, 11, 12, 13, 15, 16, 17, 18,
       19, 20, 22, 23, 24, 25, 26, 27, 29, 30], dtype=int16)

Build a Vitessce config from split sources#

This call reads image, labels, and table from separate paths and builds a linked multi-view layout (spatial view + marker/cluster views). This is useful when you want to store the AnnData table as a chunked Zarr array (for example, via adata.write(..., chunks=...)), or when your image, labels, and table are stored in different locations.

from IPython.display import HTML, display

vc = hpv.proteomics_from_split_sources(
    image_source=spatialdata_path
    / "images"
    / "blobs_multiscale_image",  # we require image of dimension "c", "y", "x"
    labels_source=spatialdata_path
    / "labels"
    / "blobs_labels",  # note we require segmentation mask to be of dimension "y", "x"
    microns_per_pixel_image=0.5,  # set as you please
    microns_per_pixel_mask=0.5,
    channels=[0, 1, 2],
    adata_source=spatialdata_path / "tables" / "table",
    visualize_feature_matrix=True,
    visualize_heatmap=False,
    embedding_key="X_umap",
    embedding_key_display_name="UMAP",
    spatial_key=None,
    cluster_key="leiden",
    cluster_key_display_name="Leiden clusters",
)

url = vc.web_app()
display(HTML(f'<a href="{url}" target="_blank">Open in Vitessce</a>'))

SpatialData-native workflow#

With proteomics_from_spatialdata, linkage is derived from SpatialData table annotations. The table should annotate the labels element using SpatialData table attributes (region/instance semantics), rather than relying on index matching alone.

sdata["table"].uns[TableModel.ATTRS_KEY]  # -> annotated by blobs_labels
{'region': 'blobs_labels',
 'region_key': 'region',
 'instance_key': 'instance_id'}

Coordinate systems and transformations#

For the SpatialData-based API, view-space scaling and reorientation are controlled through named coordinate systems.

Below we add:

  • micron: isotropic scaling from pixels to microns.

  • rotation: an affine rotation in the (x, y) plane.

  • global: required for OME-NGFF compatibility.

import numpy as np
from spatialdata.transformations import Affine, Identity, Scale, set_transformation

microns_per_pixel = 10
rotation_degrees = 20
rotation_radians = np.deg2rad(rotation_degrees)
rotation_matrix = [
    [np.cos(rotation_radians), -np.sin(rotation_radians), 0.0],
    [np.sin(rotation_radians), np.cos(rotation_radians), 0.0],
    [0.0, 0.0, 1.0],
]

transformations = {
    "micron": Scale(axes=("x", "y"), scale=[microns_per_pixel, microns_per_pixel]),
    "rotation": Affine(
        matrix=rotation_matrix, input_axes=("x", "y"), output_axes=("x", "y")
    ),
    "global": Identity(),  # Note that we need global coordinate sytem for ome ngff.
}

set_transformation(
    sdata["blobs_multiscale_image"],
    transformation=transformations,
    set_all=True,
    write_to_sdata=sdata,
)

set_transformation(
    sdata["blobs_labels"],
    transformation=transformations,
    set_all=True,
    write_to_sdata=sdata,
)

Build a Vitessce config from SpatialData#

We render the same data twice, changing only to_coordinate_system (micron vs rotation), so you can see how coordinate-system selection affects the spatial view while preserving feature-level linking.

from IPython.display import HTML, display

vc = hpv.proteomics_from_spatialdata(
    sdata_path=spatialdata_path,
    labels_name="blobs_labels",
    image_name="blobs_multiscale_image",
    table_name="table",
    channels=[0, 1, 2],
    visualize_feature_matrix=True,
    to_coordinate_system="micron",  # specify the micron coordinate system.
    visualize_heatmap=True,
    embedding_key="X_umap",
    cluster_key="leiden",
    cluster_key_display_name="Leiden",
)

url = vc.web_app()
display(HTML(f'<a href="{url}" target="_blank">Open in Vitessce</a>'))


vc = hpv.proteomics_from_spatialdata(
    sdata_path=spatialdata_path,
    labels_name="blobs_labels",
    image_name="blobs_multiscale_image",
    table_name="table",
    channels=[0, 1, 2],
    visualize_feature_matrix=True,
    to_coordinate_system="rotation",  # or a rotation
    visualize_heatmap=True,
    embedding_key="X_umap",
    cluster_key="leiden",
    cluster_key_display_name="Leiden",
)

url = vc.web_app()
display(HTML(f'<a href="{url}" target="_blank">Open in Vitessce</a>'))