M5: parallel enrichment fetch + deps/rdeps dependency-graph queries #1
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "m5-parallel-deps"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
Implements the two M5 stretch goals: parallelize Tier-2 enrichment and add
dependency-graph queries. No semantic-search work — the core stays lexical FTS.
Parallel enrichment (lparallel)
enrich-systeminto a network-onlyfetch-enrichment(no DB) plus aserial DB write. Fetches (
ocicl list+oras pull+.asdparse + README)now run concurrently while cl-sqlite writes stay serialized on the main thread.
enrich-rowsfans fetches across anlparallelkernel of--jobsworkers,writing each result as it arrives off a channel.
--jobs 1(or a single row)falls back to the sequential path.
*registry*and the output streams arepropagated into workers.
with-temp-dirused(random …)on the shared*random-state*, which is not thread-safe under concurrency — replaced with anatomic counter so parallel workers never collide on temp-dir names.
lparallelto:depends-on(ocicl.csvlockfile updated).Dependency-graph queries
db:deps-rows/rdeps-rowsvia SQLitejson_eachover the storeddepsJSON.
depsleft-joins back to the catalog so indexed dependencies show theirversion/description; unindexed ones still list by name.
cli: newdeps NAMEandrdeps NAMEsubcommands, both supporting--names-onlyand--json, with proper exit codes (missing system → 1,missing arg → 2). Shared
emit-rowsoutput helper.$ lispsearch deps drakma
$ lispsearch rdeps alexandria --names-only
$ lispsearch deps cl+ssl --json
snippet ⧉
Tests
test/m5-stretch.lisp— deterministic, network-free coverage ofdeps/rdeps(incl.
json_each, registry-name lookup, sorting, empty/unknown cases) and boththe parallel and sequential enrich paths (15 checks).
test/m4-cli.sh—deps/rdepswiring (--help) and error-path checks.Note on the parallelism benefit
Early benchmarking looked like parallel was slower (jobs=8 591s vs jobs=1 510s).
That turned out to be a measurement artifact, not a regression:
ocicl listis ~7–20s, warm ~1s(expires after minutes). Whichever arm ran first warmed the cache for the
second, so back-to-back A/B runs on the same systems are meaningless.
A controlled test on two disjoint, equally-cold 8-system sets — biased toward
serial (serial ran second, with connections already warm) — confirms parallel wins:
--jobs 8(cold)--jobs 1(cold)Per-system latency is network/cold-cache bound (~1s → 100s+) and dwarfs the
parallel-vs-serial axis; the parallel total tracks the slowest single chain, as
expected.
Notes
program-opbuild still SBCL-only (usessb-ext:atomic-incf, consistent withthe existing build target).
deps/rdepsrely on SQLite's JSON1 (json_each), present in the linkedlibsqlite3.
catalog.lisp fetches the authoritative ocicl system list over HTTP (dexador), parses it into deduped (registry . decoded-system) pairs, and hashes it (catalog-sha) for change detection. index.lisp build-index bulk-loads every name in one transaction and records catalog_sha/last_refresh; Tier-2 enrichment is stubbed until M3. search.lisp adds run-search with a GLOB name path and the FTS path. test/m2-catalog.lisp loads the full live catalog (2753 systems) and verifies name search ("trivial" --names-only) and idempotent re-build; all checks pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>