- docs/serena.md: Living ledger, phase map, invariants, milestone table
- docs/milestones/M00/: M00_plan, preflight, e2e_baseline, ci_inventory, toolcalls
- scripts/dev/: run_m00_baseline_e2e.ps1, .sh (thin verification helpers)
- Baseline tag baseline-pre-refactor created on 82a973c0
No runtime/structural changes. Behavior-preserving docs and verification only.
Made-with: Cursor
25 KiB
Codebase Audit: Stable Diffusion Web UI (serena)
Auditor: CodeAuditorGPT (staff-plus posture)
Repo: stable-diffusion-webui (workspace: c:\coding\refactoring\serena)
Commit: 82a973c04367123ae98bd9abdf80d9eda9b910e2
Audit date: 2025-03-05
Mode: Snapshot (one-shot); some inputs inferred from repo (see Input Contract note below).
Input Contract note: Not all required inputs were provided (e.g. test results/coverage summary, security scan output, SLAs, team size). Findings are based on repo structure, CI configs, dependency manifests, and code inspection. Where blocked, recommendations request the minimal next artifact.
1. Executive Summary
Strengths
- Clear entry points and layering:
webui.py/launch.pydelegate tomodules/; API and UI are separated (modules/api/,modules/ui.py). Evidence:webui.py:19-24,launch.pydelegates tolaunch_utils. - CI runs both Python (ruff) and JS (eslint) lint, plus a full pytest suite against a live server with coverage collection and artifact upload on failure. Evidence:
.github/workflows/on_pull_request.yaml,.github/workflows/run_tests.yaml. - Python lint and test tooling is modern: Ruff and pytest with
pyproject.tomlandpytest-base-urlfor API tests. Evidence:pyproject.toml,requirements-test.txt.
Biggest opportunities
- No test tiers or coverage enforcement: All tests run in one job; no smoke vs quality vs nightly; no
--cov-fail-under. CI can be slow and flaky without a fast gate; coverage can regress silently. Evidence:run_tests.yaml:58-61(single pytest run, no threshold). - Dependency and CI hygiene gaps:
requirements.txtmixes pinned and unpinned deps; npm lockfile is gitignored; CI actions use tags (@v4,@v5) not immutable SHAs;npm i --ciused instead ofnpm ci. Evidence:requirements.txt(e.g.fastapi>=0.90.1),.gitignore:40(/package-lock.json),on_pull_request.yaml:36,run_tests.yaml:14. - Tight coupling and shared global state:
modules/shared.pyandmodules/ui.pyare hubs;processing.pyandapi/api.pyimport many modules. Refactors and testing are harder. Evidence:shared.py:6-8,ui.py:16-17,processing.py:19-31.
Overall score (weighted): 2.6 / 5
Heatmap:
| Area | Score | Note |
|---|---|---|
| Architecture | 3 | Clear entry points; hub modules |
| Modularity | 2 | Heavy coupling to shared/ui |
| Code health | 3 | Ruff; some ignores; big files |
| Tests & CI | 2 | No tiers, no coverage gate |
| Security & supply | 2 | No dep/secret scan in CI |
| Performance | 3 | Not primary focus; CPU path |
| DX | 2 | No single “new dev” path |
| Docs | 2 | README/wiki; no CONTRIBUTING |
2. Codebase Map
flowchart TB
subgraph Entry
launch["launch.py"]
webui["webui.py"]
end
subgraph Core["modules/"]
launch_utils["launch_utils"]
initialize["initialize"]
shared["shared.py\n(global state)"]
cmd["shared_cmd_options\ncmd_args"]
ui["ui.py"]
api["api/"]
processing["processing.py"]
sd_models["sd_models\nsd_hijack*"]
scripts["scripts\nscript_callbacks"]
end
subgraph Ext["extensions-builtin/"]
Lora["Lora"]
LDSR["LDSR"]
etc["..."]
end
subgraph Test["test/"]
conftest["conftest.py"]
test_txt2img["test_txt2img"]
test_img2img["test_img2img"]
end
launch --> launch_utils
webui --> initialize
webui --> shared
webui --> ui
api --> shared
api --> processing
ui --> shared
ui --> processing
processing --> shared
processing --> sd_models
Test --> webui
Ext --> scripts
Drift vs intended architecture: The repo is a single Gradio/FastAPI app with a large modules/ package. There is no documented “intended” architecture; the map reflects actual structure. Observation: shared.py and ui.py act as facades and global state holders; many modules depend on them, which matches a typical “script-style” web UI rather than a layered service architecture. Evidence: modules/shared.py:1-80, modules/ui.py:16-23.
3. Modularity & Coupling
Score: 2 / 5
Top 3 tight couplings (impact + surgical decouplings)
-
sharedas global state hub- Impact: Many modules import
sharedforopts,state,sd_model, etc. Changes to options or state affect many call sites; unit testing requires heavy mocking. - Evidence:
modules/shared.py:44-47(opts,state), and grep offrom modules import shared/import modules.sharedacrossmodules/(dozens of files). - Surgical decoupling: Introduce a small
options/stateaccessor layer used by UI and API; keepsharedas the single place that writes options/state, but have core logic (e.g.processing) take options as arguments in new functions. One PR: addprocessing.run_txt2img(opts_snapshot, ...)that reads from a struct instead ofshared.opts.
- Impact: Many modules import
-
ui.pyimporting most of the app- Impact:
ui.pypulls in script_callbacks, sd_models, processing, ui_*, etc. Any UI change risks pulling in unrelated code; startup and test load are high. - Evidence:
modules/ui.py:16-17(long import list). - Surgical decoupling: Split UI into tab-specific modules that are loaded on demand (e.g. lazy import for “Extras” tab). One PR: move one tab’s UI construction into
ui_extras.pyand import it only when that tab is selected.
- Impact:
-
processing.pyand model/script coupling- Impact:
processing.pyimportssd_hijack,sd_models,scripts,ldm.*, etc. Hard to test processing logic without loading models or scripts. - Evidence:
modules/processing.py:19-31. - Surgical decoupling: Introduce interfaces (e.g. “sampler provider”, “model loader”) and inject them into processing. One PR: add a thin
ProcessingRunnerthat takes a model interface and callprocess_imageswith it so tests can pass a mock.
- Impact:
4. Code Quality & Health
Observations (facts):
- Ruff is configured in
pyproject.tomlwith select rules (B, C, I, W) and per-file ignores; E501, E721, E731, and others are ignored. Evidence:pyproject.toml:1-35. .pylintrcdisables all message groups (C, R, W, E, I). Evidence:.pylintrc:2-3.- Large files exist (e.g.
processing.py,ui.pyin the thousands of lines), which Ruff’s C901 is explicitly ignored for. Evidence:pyproject.toml:26.
Interpretation: Lint is focused on a subset of issues; complexity and line length are not enforced. This is common for a UI-heavy codebase but increases the risk of subtle bugs in hot paths.
Anti-pattern (example): Broad except or re-export of many symbols from a single module can hide errors. Evidence (re-exports): shared.py re-exports cmd_opts, OptionInfo, etc., and is imported widely.
Before/After (≤15 lines) – optional defensive guard:
Before (hypothetical optional dependency):
# In some module
from optional_otel import trace_span
def do_work():
with trace_span("do_work"):
...
After (defensive import + guard):
try:
from optional_otel import trace_span
except ImportError:
from contextlib import nullcontext
trace_span = lambda n: nullcontext()
def do_work():
with trace_span("do_work"):
...
(No OTEL was found in this repo; the pattern is recommended if optional observability is added.)
5. Docs & Knowledge
- Onboarding path: README describes installation (Windows/Linux), running via
webui-user.bat/webui.sh, and links to wiki for dependencies and features. Evidence:README.md:94-120. - Single biggest doc gap: There is no CONTRIBUTING.md or equivalent that explains: how to run tests locally, how to run lint, branch policy (e.g. “dev” vs “master”), and what CI checks are required. New contributors cannot reliably reproduce CI or know the bar for merging. Recommendation: Add a short CONTRIBUTING.md with: “Run
ruff .andnpm run lint; runpytest test/(see README for server prerequisites); open PRs againstdev.”
6. Tests & CI/CD Hygiene
Coverage: CI runs pytest ... --cov . --cov-report=xml and uploads htmlcov on failure; no --cov-fail-under or threshold is set. Evidence: .github/workflows/run_tests.yaml:61,65-69. No statement/branch threshold or tool is configured in the repo.
Flakiness: Tests depend on a live server started in the same job (launch.py --test-server then wait-for-it then pytest). Any startup delay or port conflict can cause flakiness. Evidence: run_tests.yaml:44-61.
Test pyramid: All tests are integration-style (HTTP against the running app). There are no unit tests that run without the server, and no explicit smoke vs full suite. Evidence: test/test_txt2img.py:42-43, conftest.py:34-36 (session fixture imports webui).
3-tier architecture assessment: Not present. There is a single “run everything” test job. Recommendation: Introduce Tier 1 (smoke): small subset of tests or a health endpoint, low threshold (e.g. 5%), required; Tier 2 (quality): main API tests with a coverage margin (e.g. current −2%); Tier 3 (nightly): full suite + coverage report, non-blocking with alerting.
Required checks: Linter and Tests run on push/PR; warns_merge_master.yml fails PRs that target master. There is no branch protection evidence in the repo (lives in GitHub settings). Evidence: on_pull_request.yaml:3-5, run_tests.yaml:3-5, warns_merge_master.yml:9-12.
Caches: Pip cache key includes **/requirements*txt and launch.py; models cached with key 2023-12-30. Evidence: run_tests.yaml:19-28.
Artifacts: output.txt and htmlcov uploaded when the job runs (including on failure). Evidence: run_tests.yaml:70-80.
Guardrail gaps: (1) No coverage threshold. (2) CI uses npm i --ci instead of npm ci; (3) lockfile not in repo. (4) Actions use @v4/@v5 (tags), not immutable SHAs. (5) No explicit test paths/markers for tiers (only test directory).
7. Security & Supply Chain
- Secret hygiene: API uses
secrets.compare_digestfor auth. Evidence:modules/api/api.py:17. No evidence of secrets in repo; no dedicated secret scan in CI. - Dependency risk/pinning:
requirements.txthas mixed pinning (e.g.fastapi>=0.90.1,gradio==3.41.2).requirements_versions.txtpins many deps. CI does not install from a single locked manifest;run_tests.yamlusesrequirements-test.txtandlaunch.py --skip-prepare-environmentwithTORCH_INDEX_URLfor CPU. Evidence:requirements.txt,requirements_versions.txt,run_tests.yaml:29-40. - SBOM: No SBOM or dependency export found in repo or workflows.
- CI trust boundaries: Workflows checkout code and run pip/npm; no explicit restriction to approved actions. Pinning to SHAs would reduce supply-chain risk from action updates.
Recommendations: (1) Add a scheduled or PR job that runs pip-audit (and optionally npm audit) and fails or warns on known vulns. (2) Use a single pinned manifest for CI (e.g. requirements_versions.txt or a generated lockfile) and pin CI actions to full SHAs.
8. Performance & Scalability
- Hot paths: Image generation flows through
processing.process_imagesand model forward passes;modules/processing.pyand samplers are central. Evidence:processing.py,modules/sd_samplers*.py. - IO/N+1: Not analyzed in depth; no ORM. File I/O for models and images is expected to be significant.
- Caching: Models and configs are loaded and cached in memory;
diskcacheis in requirements. Evidence:requirements.txt(diskcache),modules/cache.py. - Parallelism: Gradio queue (e.g. 64) for UI; API uses a queue lock. Evidence:
webui.py:69,modules/call_queue.py. - Perf budgets: No SLOs or perf budgets documented in repo. Performance is not the stated primary concern in README (features and compatibility are).
- Concrete profiling plan: (1) Run a single txt2img request with
python -m cProfile -o trace.stats launch.py ...(or equivalent) and inspecttrace.statsfor hotspots. (2) Add a simple “health” or “timing” endpoint that returns startup time and last-request latency. (3) If GPU is in scope, use PyTorch profiler for one batch.
9. Developer Experience (DX)
-
15-minute new-dev journey (measured steps + blockers):
(1) Clone repo. (2) Install Python 3.10.x and (for full app) GPU stack per README. (3) Runwebui-user.batorwebui.sh→ first run installs deps vialaunch.py. (4) Runruff .andnpm run lint(need Node for eslint). (5) Run tests: must start environment (e.g.launch.py --skip-torch-cuda-test --exitthenlaunch.py --test-server ...in background) thenpytest test/. Blockers: No single “run tests” script; CONTRIBUTING missing;package-lock.jsongitignored sonpm ciis not possible for reproducible JS deps. -
5-minute single-file change: Edit file → run
ruff .(and eslint if JS) → run relevant pytest (e.g.pytest test/test_txt2img.py -v). Blocker: Tests require a running server; no unit-only fast path. -
3 immediate wins: (1) Add CONTRIBUTING.md with “run ruff, eslint, pytest” and branch policy. (2) Add a script (e.g.
scripts/run_tests_local.sh) that starts the test server in background, runs pytest, then stops server. (3) Commitpackage-lock.jsonand switch CI tonpm cifor reproducible lint.
10. Refactor Strategy (Two Options)
Option A — Iterative (phased PRs, low blast radius)
- Rationale: Reduces risk; each PR is reviewable and revertible.
- Goals: Introduce test tiers, add coverage gate with margin, pin deps and actions, improve docs and DX.
- Migration steps: Phase 0: add minimal smoke check and pin one high-impact action to SHA. Phase 1: add CONTRIBUTING and marker/path discipline for tests. Phase 2: add coverage threshold (current −2%) to a “quality” job; make smoke required. Phase 3: optional nightly job and dependency audit.
- Risks: Slow; some changes (e.g. decoupling
shared) need many small steps. - Rollback: Each PR is independently revertible; keep feature flags for any new CI jobs.
Option B — Strategic (structural)
- Rationale: Address coupling and testability in a coordinated way.
- Goals: Introduce a clear service/business layer between API/UI and
processing; add interfaces for model/sampler; split UI into load-on-demand tabs; introduce 3-tier CI and coverage enforcement. - Migration steps: (1) Add interfaces and one “runner” that uses them; (2) Migrate one API endpoint to use runner; (3) Split one UI tab; (4) Add smoke + quality + nightly jobs and coverage threshold.
- Risks: Larger PRs; possible breakage for extensions that rely on current
shared/import structure. - Rollback: Feature flags for new paths; keep old paths until stable; branch-based rollout.
Tools: Ruff, pytest, coverage, pip-audit; optional: pre-commit for ruff/eslint.
11. Future-Proofing & Risk Register
| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Dependency vuln in PyTorch/Gradio | Medium | High | pip-audit in CI; pin major deps |
| Flaky CI (server startup) | Medium | Medium | Smoke tier with health endpoint; retries or longer wait-for-it |
| Coverage regression | High | Medium | Add --cov-fail-under with 2% margin |
| Action/plugin compromise | Low | High | Pin actions to full SHA |
| Breaking extension API | Low | High | Document extension contract; deprecation path for shared/opts |
ADRs to lock decisions: (1) “We use a 3-tier test strategy (smoke / quality / nightly).” (2) “CI installs from pinned manifests and uses SHA-pinned actions.” (3) “New code in modules/ should prefer dependency injection over importing shared in hot paths where feasible.”
12. Phased Plan & Small Milestones (PR-sized)
Each milestone is ≤60 minutes and mergeable as its own PR.
Phase 0 — Fix-First & Stabilize (0–1 day)
| ID | Milestone | Category | Acceptance Criteria | Risk | Rollback | Est | Owner |
|---|---|---|---|---|---|---|---|
| P0-1 | Add smoke gate: single health or txt2img test, run first, required | CI | New job or step runs 1 test; required on PR | Low | Remove step | 1h | - |
| P0-2 | Pin actions/checkout and actions/setup-python to full SHA in both workflows |
CI | Both workflows use actions/checkout@<sha> (and same for setup-python) |
Low | Revert to @v4/@v5 | 0.5h | - |
| P0-3 | Upload test artifacts on failure only (already “always” in run_tests; add lint job logs on fail) | CI | Artifacts uploaded when job fails | Low | Revert | 0.5h | - |
| P0-4 | Add pip-audit (or pip install pip-audit && pip-audit) as non-blocking step in test workflow |
Security | Step runs; fails on known vulns (or warn-only) | Low | Remove step | 0.5h | - |
Phase 1 — Document & Guardrail (1–3 days)
| ID | Milestone | Category | Acceptance Criteria | Risk | Rollback | Est | Owner |
|---|---|---|---|---|---|---|---|
| P1-1 | Add CONTRIBUTING.md with lint, test, branch policy | Docs | File exists; links from README | Low | Delete file | 1h | - |
| P1-2 | Add pytest markers (e.g. @pytest.mark.smoke) to 2–3 tests; document in CONTRIBUTING |
Tests | pytest -m smoke runs subset |
Low | Remove markers | 1h | - |
| P1-3 | Use explicit test path in CI: pytest test/ (already) and add -m "not slow" or marker for smoke |
CI | Smoke tier runs only marked tests | Low | Revert | 0.5h | - |
| P1-4 | Pin Ruff in workflow to exact version (already 0.3.3); add same for pytest/coverage in requirements-test.txt | Deps | requirements-test.txt pins pytest, pytest-cov, pytest-base-url | Low | Revert | 0.5h | - |
| P1-5 | Commit package-lock.json; change lint workflow to npm ci |
CI/Deps | Lockfile in repo; workflow uses npm ci |
Low | Revert commit, use npm i --ci |
0.5h | - |
Phase 2 — Harden & Enforce (3–7 days)
| ID | Milestone | Category | Acceptance Criteria | Risk | Rollback | Est | Owner |
|---|---|---|---|---|---|---|---|
| P2-1 | Add coverage threshold: pytest ... --cov . --cov-fail-under=X where X = (current_cov − 2)% |
CI | Build fails if coverage below X | Medium | Remove flag | 1h | - |
| P2-2 | Make smoke job required in branch protection (document in CONTRIBUTING) | CI | README or CONTRIBUTING states smoke is required | Low | Doc change | 0.5h | - |
| P2-3 | Run full test suite in same job but after smoke; or add separate “quality” job with threshold | CI | Two tiers (smoke + full) or one job with ordered steps | Low | Revert | 1h | - |
| P2-4 | Add coverage variance alert (e.g. comment on PR with coverage diff) — optional | CI | Optional step posts comment | Low | Remove step | 1h | - |
Phase 3 — Improve & Scale (weekly cadence)
| ID | Milestone | Category | Acceptance Criteria | Risk | Rollback | Est | Owner |
|---|---|---|---|---|---|---|---|
| P3-1 | Add ADR or doc: “CI Architecture Guardrails” (3-tier, pinning, coverage margin) | Docs | Doc in repo | Low | Delete | 1h | - |
| P3-2 | Introduce ProcessingRunner (or similar) with one endpoint using it |
Arch | One API path uses injected dependency | Medium | Revert | 2h | - |
| P3-3 | Add health/timing endpoint for smoke and profiling | Perf | GET returns 200 and startup/latency info | Low | Remove endpoint | 1h | - |
| P3-4 | Schedule weekly or nightly pip-audit and dependency update issue (optional) |
Security | Scheduled workflow runs | Low | Disable | 0.5h | - |
13. Machine-Readable Appendix (JSON)
{
"issues": [
{
"id": "ARC-001",
"title": "Introduce application/service layer between API and processing",
"category": "architecture",
"path": "modules/api/api.py:1-35",
"severity": "medium",
"priority": "medium",
"effort": "medium",
"impact": 4,
"confidence": 0.8,
"ice": 3.2,
"evidence": "api.py imports processing, sd_models, scripts, etc. directly; no abstraction layer.",
"fix_hint": "Add ProcessingRunner that takes opts and model interface; use in one endpoint first."
},
{
"id": "MOD-001",
"title": "Reduce shared.py global state coupling",
"category": "modularity",
"path": "modules/shared.py:1-80",
"severity": "high",
"priority": "high",
"effort": "high",
"impact": 5,
"confidence": 0.9,
"ice": 4.5,
"evidence": "Dozens of modules import shared for opts, state, sd_model.",
"fix_hint": "Pass opts/state snapshot into processing functions in one module first."
},
{
"id": "CI-001",
"title": "Add coverage threshold with 2% safety margin",
"category": "tests_ci",
"path": ".github/workflows/run_tests.yaml:61",
"severity": "medium",
"priority": "high",
"effort": "low",
"impact": 4,
"confidence": 1.0,
"ice": 4.0,
"evidence": "pytest runs with --cov but no --cov-fail-under.",
"fix_hint": "Add --cov-fail-under=(current-2) after measuring current coverage."
},
{
"id": "CI-002",
"title": "Pin GitHub Actions to immutable SHAs",
"category": "security",
"path": ".github/workflows/on_pull_request.yaml:14",
"severity": "medium",
"priority": "medium",
"effort": "low",
"impact": 3,
"confidence": 1.0,
"ice": 3.0,
"evidence": "uses: actions/checkout@v4",
"fix_hint": "Replace with actions/checkout@<full SHA> for checkout, setup-python, cache, upload-artifact."
},
{
"id": "SEC-001",
"title": "Add dependency audit in CI",
"category": "security",
"path": ".github/workflows/run_tests.yaml",
"severity": "medium",
"priority": "medium",
"effort": "low",
"impact": 4,
"confidence": 0.9,
"ice": 3.6,
"evidence": "No pip-audit or npm audit in workflows.",
"fix_hint": "Add step: pip install pip-audit && pip-audit (and optionally npm audit)."
},
{
"id": "DOC-001",
"title": "Add CONTRIBUTING.md with lint, test, branch policy",
"category": "docs",
"path": "README.md",
"severity": "low",
"priority": "high",
"effort": "low",
"impact": 3,
"confidence": 1.0,
"ice": 3.0,
"evidence": "No CONTRIBUTING or local test/lint instructions.",
"fix_hint": "Create CONTRIBUTING.md: ruff, eslint, pytest, open PRs to dev."
}
],
"scores": {
"architecture": 3,
"modularity": 2,
"code_health": 3,
"tests_ci": 2,
"security": 2,
"performance": 3,
"dx": 2,
"docs": 2,
"overall_weighted": 2.6
},
"phases": [
{
"name": "Phase 0 — Fix-First & Stabilize",
"milestones": [
{
"id": "P0-1",
"milestone": "Add smoke gate with low threshold and artifacts on fail",
"acceptance": ["One smoke test runs first", "Artifacts uploaded on fail"],
"risk": "low",
"rollback": "Remove job or step",
"est_hours": 1
},
{
"id": "P0-2",
"milestone": "Pin actions/checkout and setup-python to full SHA",
"acceptance": ["Workflows use @<sha> not @v4/@v5"],
"risk": "low",
"rollback": "Revert to tags",
"est_hours": 0.5
}
]
},
{
"name": "Phase 1 — Document & Guardrail",
"milestones": [
{
"id": "P1-1",
"milestone": "Add CONTRIBUTING.md",
"acceptance": ["File exists", "Documents lint, test, branch"],
"risk": "low",
"rollback": "Delete file",
"est_hours": 1
},
{
"id": "P1-2",
"milestone": "Add pytest markers for smoke subset",
"acceptance": ["pytest -m smoke runs subset"],
"risk": "low",
"rollback": "Remove markers",
"est_hours": 1
}
]
},
{
"name": "Phase 2 — Harden & Enforce",
"milestones": [
{
"id": "P2-1",
"milestone": "Add --cov-fail-under with 2% margin",
"acceptance": ["CI fails if coverage below threshold"],
"risk": "medium",
"rollback": "Remove flag",
"est_hours": 1
}
]
},
{
"name": "Phase 3 — Improve & Scale",
"milestones": [
{
"id": "P3-1",
"milestone": "Document CI Architecture Guardrails",
"acceptance": ["ADR or doc in repo"],
"risk": "low",
"rollback": "Delete doc",
"est_hours": 1
}
]
}
],
"metadata": {
"repo": "stable-diffusion-webui (serena)",
"commit": "82a973c04367123ae98bd9abdf80d9eda9b910e2",
"languages": ["py", "js"],
"workspace_path": "c:\\coding\\refactoring\\serena"
}
}
End of audit. For questions or to unblock: provide (1) current coverage report output (coverage report -i or XML summary), and (2) whether branch protection is enabled and which checks are required.