Virgo¶
DAG-based preprocessing for Constellation’s research stack.
Virgo handles composable, cached, distributed preprocessing of Constellation data. It reads from Ursa, writes derived assets back to Ursa, and ships a versioned standard pipeline that auto-runs on every newly-ingested recording.
Transforms are Pydantic classes — the class itself is the transform, its fields are the config, the class name is the transform name. ClassVar carries Resources, CacheKey, and version. The DAG runner dispatches each node to the right hardware (CPU, GPU, etc.) via SkyPilot (cloud) or Slurm (Polaris).
Three-axis content-addressable cache: input data identity × code version × config. Field-level invalidation via Metaxy. Time-locality declarations enable partial-window reprocess + stitch.
Contents
Where this fits¶
Virgo is one of three packages in Constellation’s research stack:
Ursa — database / data layer
Virgo (this site) — DAG-based preprocessing
Orion — research / training / benchmarking
Full architecture: Research Stack Architecture (Notion).
Status¶
🌱 Early bootstrap. Implementation tracked in the Linear Virgo project.
Phasing (mirrors the Linear project milestones):
M1 — Foundations (in progress)
M2 — MVP (Phase 2)
M3 — Standard Pipeline & Partial Reprocess
M4 — Module Version Routing & Scaling (Phase 4)
M5 — Polish