stable-diffusion-webui/docs/sdwebuirefactoraudit.md
Michael Cahill 0a8ade1a9f M00: Program kickoff, baseline freeze, phase map, E2E verification
- docs/serena.md: Living ledger, phase map, invariants, milestone table
- docs/milestones/M00/: M00_plan, preflight, e2e_baseline, ci_inventory, toolcalls
- scripts/dev/: run_m00_baseline_e2e.ps1, .sh (thin verification helpers)
- Baseline tag baseline-pre-refactor created on 82a973c0

No runtime/structural changes. Behavior-preserving docs and verification only.

Made-with: Cursor
2026-03-06 19:17:49 -08:00

52 KiB
Raw Blame History

Pre-Refactor Audit: Stable Diffusion WebUI

Auditor: CodeAuditorGPT (staff-plus, architecture-first)
Repository: AUTOMATIC1111/stable-diffusion-webui
Workspace: c:\coding\refactoring\serena
Commit: 82a973c04367123ae98bd9abdf80d9eda9b910e2
Goal: Produce the best possible pre-refactor audit for a full-repo transformation to a 5/5 score.

All findings are grounded in the codebase with file paths and line ranges. For each major section: Observations = directly evidenced; Inferences = reasoned conclusions; Recommendations = proposed changes.


0. Scoring Rubric (Used Consistently)

Score Meaning
0 Catastrophic (actively dangerous / unusable)
1 Fragile (frequent breakage, no guardrails)
2 Poor (works, but hard to change safely)
3 Acceptable (works, some guardrails, clear pain points)
4 Strong (well-structured, predictable, maintainable)
5 Exemplary (clear architecture, guardrails, docs, observability)

1. Executive Summary

Overall score: 2.4 / 5

Category Score Category Score
Architecture 2.5 Performance 3
Modularity 2 DX 2
Code health 2.5 Docs 2
Tests & CI 2 Extensions 2.5
Security 2 Overall 2.4

Strengths

  • Clear entry points (webui.py, launch.py) and a single core package (modules/). Evidence: webui.py:1-24, launch.py delegates to launch_utils.
  • Rich extension and script callback system (script_callbacks, extensions, scripts) enabling hooks without forking. Evidence: modules/script_callbacks.py:219-243, modules/extensions.py:226-300.
  • CI runs lint (ruff, eslint) and a full pytest suite against a live server with coverage and artifact upload. Evidence: .github/workflows/on_pull_request.yaml, .github/workflows/run_tests.yaml:61-80.
  • API and UI both funnel into the same processing pipeline (process_images), so behavior is consistent. Evidence: modules/api/api.py:479-482, modules/txt2img.py:104-108.

Critical weaknesses

  • Global state hub: shared.opts, shared.state, shared.sd_model are defined in shared.py and written in shared_init.py and processing.py; dozens of modules read them. Testability and determinism suffer. Evidence: modules/shared.py:14-46, modules/shared_init.py:19,46, processing.py:823-833,885-886.
  • No test tiers or coverage gate: Single test job; no smoke/quality/nightly; no --cov-fail-under. Evidence: run_tests.yaml:58-61.
  • God modules and tight coupling: processing.py (~1793 LOC), ui.py (~1236 LOC), api/api.py (~929 LOC) import many modules and rely on shared. Evidence: modules/processing.py:18-31, modules/ui.py:16-31.
  • Dependency and CI hygiene: Mixed pinning in requirements.txt; package-lock.json gitignored; CI uses npm i --ci and action tags (@v4). Evidence: requirements.txt, .gitignore:40, on_pull_request.yaml:36, run_tests.yaml:14.
  • No CONTRIBUTING or extension API contract: Onboarding and extension stability rely on wiki/tribal knowledge. Evidence: No CONTRIBUTING.md; extension hooks in script_callbacks not versioned.

Architectural posture

  • Current: Single Gradio/FastAPI app with a large procedural modules/ package; shared and ui act as hubs; processing, API, and UI are intertwined via global state.
  • Intended (from repo): None explicitly documented; structure suggests “one app, script-style, extend via callbacks.”
  • One-sentence description: A monolithic Gradio/FastAPI app whose core is a single modules package with shared global state, a central processing pipeline, and a callback-based extension system.

2. Architecture & System Map

Text-based architecture map

  • Entrypoints

    • launch.py: Parses args, prepares environment, calls launch_utils.start()webui.start(). Evidence: launch.py:25-43, modules/launch_utils.py.
    • webui.py: Imports timer/initialize, exposes create_api() and webui(); initialize.initialize() loads options and model state. Evidence: webui.py:1-50, modules/initialize.py.
  • Core packages

    • modules/: Core logic (processing, models, samplers, UI, API, extensions, paths, options). Evidence: Directory layout; 150+ Python files.
    • extensions-builtin/: Lora, LDSR, SwinIR, etc.; loaded via extensions.list_extensions(), scripts via script_loading. Evidence: modules/extensions.py:226-300, modules/script_loading.py:10-16.
    • scripts/: Built-in scripts (xyz_grid, outpainting, etc.); discovered and run via modules.scripts. Evidence: scripts/xyz_grid.py:15-18, modules/scripts.py.
  • Surfaces

    • API: FastAPI routes under /sdapi/v1/*; handlers in modules/api/api.py build StableDiffusionProcessing* and call process_images(p). Evidence: modules/api/api.py:211-251,432-490.
    • UI: Gradio built in modules/ui.py; tabs and controls call into txt2img.py, img2img.py, which create p and call scripts.run / process_images. Evidence: modules/ui.py:16-31, modules/txt2img.py:19-55,101-108.
    • Runtime: No separate “runtime” package; generation lives inside processing.py and sampler modules.
    • Extension surface: Extensions register callbacks via script_callbacks.add_callback; scripts extend scripts.Script and are loaded from scripts/ and extension dirs. Evidence: modules/script_callbacks.py:127-147, modules/scripts.py:51-120.

Layers as they actually exist

  1. Entry / bootstrap: launch.py, webui.py, initialize.py, shared_init.py.
  2. Configuration / CLI: shared_cmd_options, cmd_args, options, shared_options → populate shared.opts and cmd_opts.
  3. Global state: shared.py (opts, state, sd_model, device, etc.), shared_state.State.
  4. Orchestration: processing.process_imagesprocess_images_inner; scripts run before/after via p.scripts.
  5. Model/sampler: sd_models, sd_samplers, sd_vae, sd_hijack*; LDM/diffusion in modules/models/.
  6. UI / API: ui.py, api/api.py, txt2img.py, img2img.py — all depend on shared and processing.

Hub modules

  • shared.py: Defines and re-exports cmd_opts, opts, state, sd_model, device, and many other globals; read by almost every feature module. Evidence: modules/shared.py:14-95.
  • ui.py: Builds the Gradio UI; imports script_callbacks, sd_models, processing, ui_*, shared; central for all UI tabs. Evidence: modules/ui.py:16-31.

Cross-cutting concerns

  • Logging: Standard logging; modules/logging_config.py; no structured/observability stack observed.
  • Config: options.Options in shared.opts; loaded/saved via shared_options and UI; overrides applied in process_images. Evidence: modules/options.py, processing.py:823-833.
  • State: shared_state.State (job, interrupted, sampling_step, etc.); mutated in processing, API, call_queue, progress. Evidence: modules/shared_state.py:11-80, grep of state. across modules.
  • Error handling: modules/errors.report(); callbacks wrapped with try/except in script_callbacks. Evidence: modules/script_callbacks.py:15-16,253-259.

Drift analysis

  • The repo does not claim a “clean layered” architecture. Observation: Layers are implicit (bootstrap → config → state → orchestration → model → UI/API). Drift: Orchestration and model code are mixed in processing.py; UI and API both depend directly on shared and processing with no abstraction layer. To reach a clean layered design would require extracting a runtime layer (generation pipeline with explicit inputs/outputs) and dependency injection for opts/state/model.

Score: architecture 2.5 / 5


3. Runtime Pipeline Analysis

End-to-end generation pipelines

txt2img

  • Request handling: API: api.text2imgapi(txt2imgreq) builds StableDiffusionProcessingTxt2Img from request, sets p.script_args, then scripts.scripts_txt2img.run(p, *p.script_args) or process_images(p). UI: txt2img_create_processing() builds p from Gradio args, then scripts.scripts_txt2img.run(p, *p.script_args) or process_images(p). Evidence: modules/api/api.py:432-490, modules/txt2img.py:14-55,101-108.
  • Processing: process_images(p) applies override_settings to opts, reloads model/VAE if needed, then process_images_inner(p). Evidence: modules/processing.py:819-858.
  • Inner loop: process_images_inner(p) fixes seed, sets job_count, calls p.init() then for each iteration p.sample() (which creates sampler, runs sampler.sample(...), optionally hires pass). Evidence: modules/processing.py:863-934,1307-1371.
  • Sampler: sd_samplers.create_sampler(p.sampler_name, p.sd_model); samplers sample(p, x, conditioning, unconditional_conditioning, ...) produces latents; then decode_first_stage (or batch decode) and image save. Evidence: modules/processing.py:1307-1345, modules/sd_samplers_common.py:73, modules/sd_samplers_kdiffusion.py:190.
  • Model loading: shared.sd_model is set by sd_models.reload_model_weights(); used inside process_images and in sampler. Evidence: processing.py:828-830,885-886, modules/sd_models.py.

img2img / inpainting

  • Same orchestration: API or UI builds StableDiffusionProcessingImg2Img (with init_image, mask, etc.), then process_images(p). p.init() and p.sample() are overridden in img2img subclass; init latent comes from VAE encode of image. Evidence: modules/img2img.py:10-17, modules/processing.py (img2img subclass).

Orchestration

  • Orchestration layer: Effectively process_images + process_images_inner + p.init() / p.sample(). Scripts hook via p.scripts.before_process, process, process_before_every_sampling. Evidence: processing.py:819-821,912-914,1336-1343.
  • Sampler orchestration: One sampler per p; created inside p.sample() (e.g. sd_samplers.create_sampler(self.sampler_name, self.sd_model)). Evidence: processing.py:1307-1308,1384.
  • Model loading and selection: sd_models.reload_model_weights() / get_closet_checkpoint_match; override in p.override_settings['sd_model_checkpoint']. Evidence: processing.py:828-836, modules/sd_models.py.
  • Seed handling: get_fixed_seed(p.seed); p.all_seeds/p.all_subseeds set in process_images_inner; p.rng used in sample. Evidence: processing.py:871-907, processing.py:1335,1759-1760.
  • Batching: p.n_iter outer iterations; p.batch_size per iteration; loop in process_images_inner over batches. Evidence: processing.py:929-934 and following.

Control flow

  • Tangled/duplicated: Override application and model/VAE reload are in process_images; seed/prompt setup in process_images_inner; script hooks at multiple points. Some logic (e.g. hires) is in StableDiffusionProcessingTxt2Img.sample and sample_hr_pass (large methods). Evidence: processing.py:819-858,863-934,1307-1393.
  • Seams for a “runtime” layer: (1) Everything after p.init() and before image save could be a pure function run_sampling(p, sampler, model, rng). (2) Override application could be a function that returns an opts snapshot and restores it. (3) Script hooks could be a formal pipeline stage interface.

Reproducibility

  • Exact inputs for reproducible output: Seed(s), subseed, subseed_strength, prompt, negative_prompt, sampler, steps, cfg_scale, dimensions, model (checkpoint), VAE, and all options that affect sampling (e.g. clip_skip). Override_settings applied in process_images mutate opts for the duration of the run. Evidence: processing.py:823-833,871-907, StableDiffusionProcessing dataclass fields.
  • Inherent vs avoidable nondeterminism: Inherent: none if seed and hardware are fixed. Avoidable: (1) opts and state are global, so concurrent or re-entrant calls can interfere. (2) Model/VAE loaded from shared so any change elsewhere affects the run. Passing opts/state/model explicitly would make runs deterministic given the same inputs.

4. Global State & State Model

Global state inventory

Variable Definition Writers Readers (representative)
shared.cmd_opts shared_cmd_options.cmd_opts Parsed at startup Many (paths, options, extensions, api)
shared.opts options.Options(...) in shared_init shared_init.py:19; opts.set() in processing, options UI processing, api, ui, sd_models, sd_samplers, images, etc.
shared.state shared_state.State() in shared_init shared_init.py:46; state.begin(), .skip(), .interrupt(), job_count/sampling_step in processing, progress, api processing, progress, api, ui_toprow, call_queue, sd_samplers_cfg_denoiser
shared.sd_model shared.py:46 sd_models (load/unload) processing, api, ui, sd_samplers, sd_hijack, etc.
shared.device shared.py:25 initialization processing, models, samplers
shared.demo shared.py:23 ui.py (create_ui) webui, ui
shared.hypernetworks, loaded_hypernetworks shared.py:31-33 hypernetwork loading sd_hijack, api
shared.sd_upscalers shared.py:63 upscaler registration api, extras
shared.face_restorers shared.py:41 face_restoration_utils api, processing
shared.prompt_styles, interrogator, total_tqdm, mem_mon shared.py:37-39,71,73,74 ui/init / progress ui, progress, etc.

State mutation map (who mutates what)

  • opts: Set at startup from config; mutated in process_images for override_settings; restored in finally if override_settings_restore_afterwards; also mutated by options UI. Evidence: processing.py:823-833,851-854, modules/options.py.
  • state: state.begin(job=...) at API/UI entry; state.job_count, state.sampling_step, state.current_image, etc. set during processing; state.interrupt(), state.skip() from API. Evidence: modules/shared_state.py, processing.py:927-928, api/api.py:475.
  • sd_model: Loaded/unloaded by sd_models.reload_model_weights(), called from processing and API. Evidence: modules/sd_models.py, processing.py:828-836.

Classification

  • Configuration: cmd_opts, opts (with override_settings applied per run).
  • Runtime execution: state (job, interrupted, sampling_step, current_image, etc.).
  • Model registry: sd_model, clip_model, sd_upscalers, face_restorers, hypernetworks, loaded_hypernetworks.
  • UI/session: demo, settings_components, tab_names, gradio_theme, prompt_styles.
  • Extension-owned: Extensions register callbacks and scripts; extension list in extensions.extensions; no single “extension state” object.

Testability impact: Unit-testing any code that reads shared.opts or shared.state or shared.sd_model requires patching globals or starting the full app. Determinism impact: Concurrent or sequential runs can affect each other via shared opts/state/model. Extension impact: Extensions that read or mutate shared are tied to the current layout; any refactor of shared state can break them.

Score: modularity 2 / 5 (reflects global-state risk)


5. Dependency Graph & Coupling

Top 20 most imported modules (by number of files importing)
(Derived from grep of from modules.* import / import modules.* in repo.)

  1. shared / modules.shared
  2. paths_internal (paths, script_path, models_path, etc.)
  3. processing (Processed, process_images, StableDiffusionProcessing*)
  4. options / OptionInfo, options_section
  5. script_callbacks
  6. ui_components
  7. sd_samplers / sd_models
  8. infotext_utils
  9. images
  10. scripts
  11. shared_cmd_options / cmd_opts
  12. sd_hijack / model_hijack
  13. errors
  14. devices
  15. extensions
  16. paths
  17. upscaler / Upscaler, UpscalerData
  18. util
  19. sd_vae
  20. ui_common

Top 10 hub modules (inbound references)

  1. shared — re-exports and global state; used by almost every feature module.
  2. paths_internal — paths used by options, shared, extensions, config, images.
  3. processing — API, UI, scripts all call process_images and use Processed.
  4. script_callbacks — samplers, scripts, extensions register and call callbacks.
  5. options / shared_options — UI and shared depend on OptionInfo/options_section.
  6. ui_components — ui_*, scripts use FormRow, ToolButton, etc.
  7. sd_samplers / sd_models — processing, api, scripts, ui.
  8. infotext_utils — ui, processing, api, scripts.
  9. images — ui, processing, api, extras.
  10. scripts — ui, api, txt2img, img2img, extensions.

Cyclic dependencies

  • No strict import cycles detected at module level (Python would fail to load). Observation: shared imports shared_cmd_options, options, shared_items, etc.; those do not import shared at top level (some use it at runtime). So no cycle in the static graph. Inference: Cycles could appear at runtime (e.g. script_callbacks → shared → options → …). Not fully traced here.

God modules

  • ui.py: ~984 LOC; imports 16+ modules; builds entire Gradio UI. Evidence: modules/ui.py:16-31, file size.
  • processing.py: ~1793 LOC; imports 15+ modules; contains processing classes and the full sampling loop. Evidence: modules/processing.py:18-31, line count.
  • api/api.py: ~929 LOC; many routes and handlers; imports shared, processing, scripts, sd_models, etc. Evidence: modules/api/api.py:19-34, file size.

God functions

  • process_images_inner — long loop, seed/prompt setup, batch iteration, script hooks. Evidence: processing.py:863-~1100+.
  • StableDiffusionProcessingTxt2Img.sample and sample_hr_pass — large methods with hires and decode logic. Evidence: processing.py:1307-1393.

Per major module (summary)

  • shared: Inbound: almost all; outbound: shared_cmd_options, options, paths_internal, util, shared_items, shared_gradio_themes. Reliance on global state: is the state holder.
  • processing: Inbound: api, img2img, txt2img, scripts (many). Outbound: shared, sd_models, sd_samplers, sd_vae, devices, scripts, images, etc. Heavy reliance on shared.opts, shared.state, shared.sd_model.
  • api/api: Inbound: webui (create_api). Outbound: shared, processing, scripts, sd_models, images, progress, etc. Reliance on shared and process_images.

Import centrality vs runtime criticality: shared is central both in imports and at runtime (opts/state/sd_model). processing is runtime-critical and highly imported. paths_internal is central for imports but less “hot” at runtime.

Surgical decouplings (35, PR-sized)

  1. Pass opts snapshot into process_images: Add a helper that builds a dict or small struct from opts (and override_settings) and pass it into a new process_images_with_opts(p, opts_snapshot) used by one API endpoint first; keep reading from snapshot instead of global inside that path. Evidence to address: processing.py:823-833.
  2. Extract “sampler runner”: Move the call self.sampler.sample(self, x, conditioning, ...) and the immediate decode into a function run_sampler_step(p, sampler, x, conditioning, uc, image_cond) in a new module; call it from StableDiffusionProcessingTxt2Img.sample. Reduces god-method size and gives a seam for testing. Evidence: processing.py:1345.
  3. UI tab registry: Replace the single ui.create_ui() with a list of “tab builders”; each tab is a function that returns (name, blocks). Register txt2img, img2img, settings, etc. from their modules. One PR: move one tab into a function and register it. Evidence: modules/ui.py (single create_ui).
  4. API handler → processing adapter: Introduce Txt2ImgRunner.run(request) -> Processed that builds p, calls process_images(p), returns Processed; have text2imgapi call Txt2ImgRunner.run(txt2imgreq). Keeps API thin and gives a single place to swap implementation later. Evidence: api/api.py:432-490.
  5. Extension callback types: In script_callbacks, add a small module that defines dataclasses or protocols for each callback param (e.g. ImageSaveParams already exists). Document and version the callback signatures; add a “supported callback API version” constant. Evidence: script_callbacks.py:19-109,219-243.

Score: modularity 2 / 5


6. Code Health & Maintainability

File size distribution (top 20 by LOC)

Path LOC
modules/processing.py 1793
modules/models/diffusion/ddpm_edit.py 1236
modules/ui.py 984
modules/scripts.py 790
modules/models/diffusion/uni_pc/uni_pc.py 752
modules/sd_models.py 750
modules/api/api.py 750
modules/images.py 673
modules/deepbooru_model.py 668
modules/ui_extra_networks.py 662
scripts/xyz_grid.py 643
modules/hypernetworks/hypernetwork.py 633
modules/textual_inversion/textual_inversion.py 564
modules/ui_extensions.py 544
modules/models/sd3/mmdit.py 528
modules/sd_hijack_optimizations.py 501
modules/script_callbacks.py 437
modules/models/sd3/other_impls.py 417
modules/infotext_utils.py 400
modules/shared_options.py 385

Complexity hotspots (top functions by scope and branches)

  • process_images_inner — long loop, many branches, script hooks. Evidence: processing.py:863-~1100.
  • StableDiffusionProcessingTxt2Img.sample / sample_hr_pass — hires logic, decode paths. Evidence: processing.py:1307-1393.
  • ui.create_ui — builds all tabs and controls. Evidence: modules/ui.py (single large function/flow).
  • Sampler sample methods (e.g. k-diffusion, timesteps) — steps, conditioning. Evidence: sd_samplers_kdiffusion.py:190, sd_samplers_timesteps.py:141.
  • api.text2imgapi / img2imgapi — request parsing, script args, process_images. Evidence: api/api.py:432-565.

Lint configuration

  • Ruff: pyproject.toml: select B, C, I, W; ignore E501, E721, E731, I001, C901, C408, W605; per-file ignore E402 in webui.py. Evidence: pyproject.toml:1-35.
  • Pylint: .pylintrc disables C, R, W, E, I. Evidence: .pylintrc:2-3.
  • Observation: Line length and complexity (C901) are ignored; many long files and long functions.

Anti-patterns

  • Broad imports: from modules import shared then use of shared.opts, shared.state everywhere. Evidence: grep results across modules.
  • Re-exports: shared.py re-exports cmd_opts, OptionInfo, natural_sort_key, list_checkpoint_tiles, etc. Evidence: shared.py:75-95.
  • Dynamic imports: script_loading.load_module(path) for extensions; scripts loaded by importlib. Evidence: script_loading.py:10-16, extensions.py (preload).
  • Broad except: Callbacks wrapped with try/except that report and continue. Evidence: script_callbacks.py:254-259.

Dead code / unused abstractions

  • batch_cond_uncond in shared (“old field, unused now”). Evidence: shared.py:17.
  • No automated dead-code analysis run; inference: large files likely contain legacy or redundant paths.

Score: code_health 2.5 / 5


7. Tests, CI/CD & Reproducibility

Test pyramid

  • Unit: Almost none; a few tests in test_torch_utils.py, test_utils.py (e.g. parametrized URL/float checks). Evidence: test/test_torch_utils.py, test/test_utils.py.
  • Integration: Majority: tests start the app (via launch.py --test-server), then pytest hits HTTP endpoints (e.g. /sdapi/v1/txt2img). Evidence: test/test_txt2img.py:42-43, conftest.py:34-36, run_tests.yaml:44-61.
  • E2E: Same as integration (server + HTTP); no separate E2E layer.

Coverage

  • Collected: coverage run for server, pytest --cov . --cov-report=xml. Evidence: run_tests.yaml:46-61,65-69.
  • No --cov-fail-under or threshold in config. Evidence: grep for cov-fail-under / fail_under: none.

Flakiness risks

  • Server startup: wait-for-it --service 127.0.0.1:7860 -t 20; if startup is slow or port in use, tests fail. Evidence: run_tests.yaml:58-59.
  • Single job: server and pytest in one job; no retries or separate smoke step.

CI job structure

  • Lint: ruff (Python), eslint (JS); on push/PR. Evidence: on_pull_request.yaml.
  • Tests: one job “tests on CPU with empty model”; install deps, launch server in background, pytest, upload artifacts. Evidence: run_tests.yaml.
  • Branch policy: warns_merge_master.yml fails PRs targeting master. Evidence: warns_merge_master.yml:9-12.

Reproducibility

  • Python: requirements.txt mixed pins; requirements_versions.txt has more pins; CI uses requirements-test.txt + launch.py with TORCH_INDEX_URL for CPU. No single lockfile. Evidence: requirements.txt, requirements_versions.txt, run_tests.yaml:29-40.
  • JS: package-lock.json in .gitignore; CI uses npm i --ci. Evidence: .gitignore:40, on_pull_request.yaml:36.
  • Models: CI caches models with key 2023-12-30; tests run with “empty model” (no download in test flow). Evidence: run_tests.yaml:24-28.

Action pinning

  • Uses tags: actions/checkout@v4, actions/setup-python@v5, actions/cache@v4, actions/upload-artifact@v4. Not SHA-pinned. Evidence: on_pull_request.yaml:14,15, run_tests.yaml:14,25,71,78.

3-tier test strategy (recommended)

  • Tier 1 (smoke): Single health or minimal txt2img request; run first; required; low threshold (e.g. 5% coverage or none). Acceptance: Job completes in <2 min; required on PR.
  • Tier 2 (quality): Full test suite; coverage gate with ≥2% margin below current; required. Acceptance: All tests pass; coverage above threshold.
  • Tier 3 (nightly): Same suite + optional extras; non-blocking; alert on failure. Acceptance: Runs on schedule; artifacts and report.

Coverage threshold plan

  • Measure current coverage (e.g. coverage report -i after one run). Set --cov-fail-under=X where X = current 2%. Enforce in Tier 2.

Reproducible environment plan

  • Single locked manifest for CI: e.g. generate requirements-ci.txt from current env with pins; use in CI. Commit package-lock.json and use npm ci for JS. Document model expectations (empty for CI; optional cache key for reproducibility).

Score: tests_ci 2 / 5


8. Security & Supply Chain

Dependency pinning

  • Observation: requirements.txt has mixed: some == (gradio, protobuf, transformers), some >= (fastapi). requirements_versions.txt pins many. No single source of truth for CI. Evidence: requirements.txt, requirements_versions.txt.
  • Inference: Supply-chain and build reproducibility are at risk without a single locked manifest.

Vulnerability exposure

  • No pip-audit or npm audit in CI. Evidence: Grep: no pip-audit/npm audit in workflows.
  • Known sensitive deps: protobuf==3.20.0 (historical CVE; 3.20.x had fixes); versions in repo may have known issues. Recommend running pip-audit and npm audit to get current list.

Secret handling

  • API auth uses secrets.compare_digest for HTTP basic. Evidence: modules/api/api.py:17 (import). No secrets in repo observed; no dedicated secret scan in CI.

CI trust boundaries

  • Workflows use checkout, setup-python, setup-node, cache, upload-artifact. Evidence: workflow files.
  • Recommendation: Pin all actions to full SHA to avoid action supply-chain risk.

SBOM

  • No SBOM or dependency export found in repo or workflows.

Recommendations

  • Add pip-audit (and optionally npm audit) as a CI step; fail or warn on known vulns.
  • Pin GitHub Actions to immutable SHAs.
  • Use locked manifests: one for Python (CI), commit and use package-lock.json with npm ci.

Score: security 2 / 5


9. Performance & Scalability

Hot paths

  • processing.py: process_images_inner, p.sample(), sampler sample(), decode_first_stage / batch decode. Evidence: processing.py:863-934, sd_samplers_common.py:73.
  • Model forward: Inside sampler and LDM/diffusion models. Evidence: modules/models/diffusion/, sd_samplers_*.py.

Model loading and caching

  • Models loaded via sd_models.reload_model_weights(); kept in shared.sd_model. VAE similarly. Evidence: modules/sd_models.py, modules/sd_vae.py.
  • Caching: diskcache in requirements; modules/cache.py used for extension git info. Evidence: requirements.txt, modules/cache.py, extensions.py:146.

Queueing

  • Gradio queue: shared.demo.queue(64). Evidence: webui.py:69.
  • API: queue lock in call_queue; wrap_gradio_gpu_call etc. Evidence: modules/call_queue.py, api/api.py (task_id, start_task, finish_task).

Performance risks

  • Repeated I/O: model load on first request; embedding reload when not disabled. Evidence: processing.py:909-910 (embedding load).
  • Unnecessary recomputation: no obvious redundant forward passes; some options (e.g. live preview) add work. Evidence: processing.py:923-924.

Profiling plan

  1. Run a single txt2img request with python -m cProfile -o trace.stats (or PyTorch profiler) and inspect hotspots in process_images_inner and sampler.
  2. Add a lightweight /sdapi/v1/health or /sdapi/v1/timing that returns startup time and (if stored) last-request latency for smoke and monitoring.
  3. Optionally: small load script (e.g. 10 sequential txt2img) to measure P95 latency.

Performance budget proposal

  • Not stated in repo. Recommendation: If performance is a goal, define e.g. “P95 txt2img (N steps) < X s on CPU test config” and “startup < Y s”; measure in CI or nightly and alert on regression.

Score: performance 3 / 5


10. Developer Experience (DX)

15-minute new-dev journey

  • Steps: Clone → install Python 3.10.x (and Node for lint) → run webui-user.bat or webui.sh (first run installs deps) → run ruff . and npm run lint → run tests (start server in background, then pytest test/). Evidence: README, workflow files.
  • Blockers: No single “run tests” script; CONTRIBUTING missing; lockfile gitignored so npm ci not possible; tests require full server.

Local test workflow

  • Lint: ruff . (Python), npm run lint (JS). Evidence: package.json, pyproject.toml.
  • Tests: Start server (launch.py --skip-torch-cuda-test --test-server ...), then pytest test/ (or pytest test/test_txt2img.py -v). Evidence: run_tests.yaml:44-61, conftest.py.
  • Single test: pytest test/test_txt2img.py::test_txt2img_simple_performed -v (with server running).

CONTRIBUTING

  • Observation: No CONTRIBUTING.md in repo. Evidence: No file found.
  • Recommendation: Add CONTRIBUTING.md with lint commands, test commands, branch policy (e.g. PR to dev), and link to extension docs.

Extension developer experience

  • Observation: Extension authors learn from wiki and by reading script_callbacks, scripts.Script, and built-in extensions. No single “Extension API” doc in repo. Evidence: CODEOWNERS comment about localizations and extensions wiki.
  • Recommendation: Document callback list and signatures, script lifecycle, and “supported API version”; provide a minimal extension template and test approach (e.g. run with one extension enabled).

Score: dx 2 / 5


11. Documentation

README

  • Observation: Installation (Windows/Linux), features, running, limitations (e.g. Python 3.10.6). Evidence: README.md:94-120, feature list.
  • Gaps: No “Development” or “Contributing” section; no local test/lint steps.

CONTRIBUTING

  • Observation: Absent. Evidence: No CONTRIBUTING.md.

Architecture docs

  • Observation: No ADRs or architecture diagrams in repo. Evidence: No docs in repo root or docs/.

Extension API docs

  • Observation: Callback names and param types exist in code (script_callbacks.py); no explicit “contract” doc or versioning. Evidence: script_callbacks.py:19-109,219-243.
  • Inference: Extension API is tribal knowledge plus code inspection.

Score: docs 2 / 5


12. Extension Ecosystem Stability

Extension loading

  • Discovery: list_extensions() scans extensions_builtin_dir and extensions_dir; builds Extension with ExtensionMetadata from metadata.ini. Evidence: extensions.py:226-300.
  • Import: Scripts under extension dirs loaded via script_loading.load_module() (e.g. preload.py); scripts list from extension.list_files('scripts', '.py'). Evidence: script_loading.py:10-16, extensions.py:178-189.
  • Lifecycle: Extensions listed at startup; enabled/disabled via opts; callbacks registered when scripts load. Evidence: extensions.active(), shared.opts.disabled_extensions.

Extension API surface

  • Hooks/callbacks: script_callbacks.callback_map (app_started, model_loaded, ui_tabs, before_image_saved, cfg_denoiser, etc.). Evidence: script_callbacks.py:219-243.
  • Stability: No version field in callback API; params are dataclasses (e.g. ImageSaveParams). Adding or changing params can break extensions. Evidence: script_callbacks.py:19-109.

Backwards compatibility risks

  • Extensions import modules.* (e.g. modules.ui_components, modules.scripts, modules.processing, modules.shared). Any rename or move of these breaks them. Evidence: extensions-builtin/Lora/network_lora.py:4, extensions-builtin/soft-inpainting/scripts/soft_inpainting.py:4-6.
  • Classification: Internal-but-relied-upon: modules.shared, modules.scripts, modules.processing, modules.ui_components, modules.paths_internal, modules.script_callbacks. Semi-private: callback param types (used by extensions but not clearly versioned). Stable: Only the existence of callback names and the Script base class; no formal stability guarantee.

Governance gaps

  • No extension API versioning; no deprecation policy; no compatibility matrix (e.g. “extensions built for API v1”).

Recommendations

  • Extension API contract: Publish a minimal doc listing callback names, param types, and “contract version” (e.g. 1.0); state that new fields may be added but existing ones will not be removed for that version.
  • Versioning: Add EXTENSION_API_VERSION = "1.0" and document what it covers; bump when breaking callback or Script interface changes.
  • Deprecation path: For breaking changes, add new callbacks or params, deprecate old ones with a comment and log warning, remove in next major version.

Score: extensions 2.5 / 5


13. Target Architecture Definition (What 5/5 Looks Like)

Clear separation

  • Runtime (generation pipelines): A dedicated package or module that takes (prompt, negative_prompt, sampler_name, steps, seed, model_ref, opts_snapshot, …) and returns (images, infotext). No global shared.opts or shared.sd_model inside this layer; model and sampler are injected or resolved from a registry interface.
  • API: HTTP layer that maps requests to runtime inputs and runtime outputs to responses; uses a runner/adapter that calls the runtime with explicit parameters.
  • UI: Gradio (or other) that builds controls and calls the same runner or runtime via a thin adapter; no direct access to shared.sd_model or processing internals for generation.
  • Extension system: Documented callback and Script API with a version; extensions register with a stated contract; core does not depend on extension internals.

Explicit dependency injection

  • Models: Runtime receives a “model provider” or “checkpoint loader” interface; API/UI obtain it from a registry (which may still wrap sd_models) and pass it in.
  • Samplers: Sampler creation behind an interface; runtime gets a sampler for the current model and step config.
  • Configuration: Options passed as a snapshot (or immutable view) into the runtime; no opts.set() inside the core pipeline.

No critical global state in hot paths

  • Generation path uses only explicit arguments and injected dependencies; state (job, interrupted) can remain for progress/cancellation if accessed via a narrow interface (e.g. “execution context”) rather than raw global.

Deterministic artifact outputs

  • Same (seed, prompt, opts_snapshot, model version) → same output; runtime is pure modulo RNG and model weights.

Reproducible CI

  • Pinned Python deps (lockfile or single requirements-ci.txt); committed package-lock.json and npm ci; SHA-pinned GitHub Actions; 3-tier tests with coverage gate and ≥2% margin.

Stable extension API

  • Documented callback and Script contract; version number; deprecation policy (new optional params allowed; removal only with version bump and notice).

14. Refactorability & Extraction Analysis

Architectural fault lines

  • Runtime vs rest: Boundary = “everything needed to produce images from (prompt, seed, opts, model, sampler).” Cut at: (1) entry to process_images_inner (caller supplies opts snapshot and model reference), (2) exit after Processed is built. Evidence: processing.py:863-858.
  • API vs shared: Boundary = API handlers should not read/write shared except via a narrow facade (e.g. “get current model,” “apply overrides”). Cut at: replace direct opts/sd_models usage in api.py with calls to an adapter. Evidence: api/api.py:471-472, opts.outdir_*.
  • UI vs processing: Boundary = UI should only build p and call a single entry point (e.g. run_txt2img(p) or script runner). Cut at: txt2img() / img2img() in txt2img.py/img2img.py already call process_images(p); further cut = move creation of p into an adapter that takes “request” and returns Processed.

Safe extraction seams

  • Seed/prompt setup: Logic in process_images_inner that sets p.all_seeds, p.all_prompts could move to a prepare_prompts_and_seeds(p) function in the same file. Evidence: processing.py:871-907.
  • Override apply/restore: The block in process_images that applies override_settings and restores in finally could be a context manager with temporary_opts(override_settings): .... Evidence: processing.py:823-857.
  • Script callbacks (params): script_callbacks already uses dataclasses; moving them to a callback_params.py (or keeping and documenting) is a small, safe move. Evidence: script_callbacks.py:19-109.

Minimal architectural cuts

  • Extract runtime layer: (1) Introduce runtime.run_txt2img(p, opts_snapshot, model_provider) that does not read shared.opts/shared.sd_model inside; call it from process_images with snapshot and current model. (2) Gradually move logic from process_images_inner into runtime and pass opts/model explicitly.
  • Decouple UI from processing: (1) Keep UI building p and calling scripts.run / process_images; (2) Introduce ProcessingRunner.run_txt2img(args) that returns Processed; UI and API both call the runner. No need to change UI internals in the first cut.
  • Decouple API from shared: (1) API builds p and calls a runner that takes p and (optionally) opts_snapshot; (2) Runner uses snapshot for paths/options instead of opts global; (3) Model still from registry/facade until a later phase.

Recommended order of extractions

  1. Phase 0 (stabilize): Pin CI actions to SHA; add smoke test; add pip-audit; commit package-lock and use npm ci. No architectural change.
  2. Phase 1 (seams): Add CONTRIBUTING; document extension callback API version; add temporary_opts (or equivalent) and use it in process_images; add pytest markers for smoke.
  3. Phase 2 (runtime boundary): Introduce opts_snapshot type and build it in process_images from opts + override_settings; pass snapshot into process_images_inner and refactor inner to read from snapshot where possible (leave state and model for later).
  4. Phase 3 (runner): Add Txt2ImgRunner / Img2ImgRunner (or single ProcessingRunner) that builds p, applies overrides, calls process_images, returns Processed; switch API and then UI to use runner.
  5. Phase 4 (model injection): Introduce a model-provider interface; runtime gets model from provider instead of shared.sd_model; registry implementation wraps current sd_models. Then option to run tests with a mock provider.
  6. Phase 5 (UI registry): Replace monolithic create_ui with a list of tab builders; move one tab at a time into a builder and register.

15. Refactor Strategy (Goal: 5/5)

Option A — Iterative (low blast radius)

  • PR-sized steps, each ≤60 minutes; reversible.
  • Focus: CI guardrails, test tiers, pinning, small decouplings.

Phases

  • Phase 0 — Fix-first & stabilize (01 day): Add smoke test (one health or txt2img); pin checkout/setup-python to SHA; add pip-audit step; upload artifacts on fail. Risks: Low. Rollback: Revert workflow changes.
  • Phase 1 — Document & guardrail (13 days): CONTRIBUTING.md; pytest markers (smoke); explicit test path in CI; pin Ruff/pytest in requirements-test; commit package-lock, use npm ci. Risks: Low. Rollback: Revert doc and workflow.
  • Phase 2 — Harden (37 days): Add --cov-fail-under with 2% margin; make smoke required; add “quality” job or ordered steps. Risks: Medium (coverage may fluctuate). Rollback: Remove threshold.
  • Phase 3 — Small decouplings (ongoing): temporary_opts context manager; prepare_prompts_and_seeds extraction; one API endpoint via Txt2ImgRunner; extension API version constant + doc. Risks: Low per PR. Rollback: Revert individual PRs.

Milestone labels: Phase 01 = foundational; Phase 2 = hardening; Phase 3 = enabling (enables later architectural work).

Option B — Strategic (structural)

  • Introduce runtime/service layer: Extract generation into a module that accepts opts_snapshot and model provider; move sampling loop and decode there.
  • Decouple shared.py: Pass option/state snapshots into processing; introduce “execution context” for state if needed; reduce direct shared reads in hot path.
  • Modularize UI: Tab registry; one tab per module; lazy or explicit registration.
  • ProcessingRunner: API and UI call a runner that builds p, applies overrides, calls runtime, returns Processed.
  • 3-tier CI with coverage gates: Smoke (required), quality (required, coverage threshold), nightly (optional, alert).
  • Deterministic environment: Locked Python manifest for CI; npm ci; document model handling.

Phases

  • Phase 0: Same as Option A (stabilize). Goals: Reliable CI. Risks: Low. Rollback: Revert.
  • Phase 1: Runtime boundary + opts_snapshot. Goals: process_images_inner receives opts_snapshot; no opts.set in inner. Risks: Medium (large diff). Rollback: Feature-flag or branch; keep old path.
  • Phase 2: ProcessingRunner + API/UI switch. Goals: Single entry for generation; API and UI call runner. Risks: Medium. Rollback: Keep old API/UI paths until runner stable.
  • Phase 3: Model provider interface; 3-tier CI; extension API version and doc. Goals: Testable runtime with mock model; full guardrails; stable extension contract. Risks: Medium. Rollback: Per-component revert.

Milestone labels: Phase 0 = foundational; Phase 12 = architectural; Phase 3 = hardening.


16. Risk Register

id title likelihood impact mitigation residual risk
R1 Dependency vuln (PyTorch/Gradio/etc.) medium high pip-audit + npm audit in CI; pin major deps low
R2 Flaky CI (server startup / port) medium medium Smoke tier with health endpoint; increase wait-for-it or retries low
R3 Coverage regression high medium Add --cov-fail-under with 2% margin low
R4 Action/plugin compromise low high Pin all actions to full SHA low
R5 Breaking extension API medium high Document and version callback/Script API; deprecation path medium
R6 Refactor introduces bugs in generation medium high Small PRs; feature flags; keep old path until new path validated medium
R7 Global state races (concurrent requests) low high Queue/lock already in place; document single-worker assumption or add tests low

17. Machine-Readable Appendix (JSON)

{
  "issues": [
    {
      "id": "ARC-001",
      "title": "Extract runtime layer with explicit opts and model",
      "category": "architecture",
      "path": "modules/processing.py:863-934",
      "severity": "high",
      "priority": "high",
      "effort": "high",
      "impact": 5,
      "confidence": 0.9,
      "evidence": "process_images_inner and sample() read shared.opts, shared.state, shared.sd_model throughout.",
      "fix_hint": "Introduce opts_snapshot and pass into process_images_inner; add model_provider interface and use it in sample()."
    },
    {
      "id": "MOD-001",
      "title": "Reduce shared global state in hot path",
      "category": "modularity",
      "path": "modules/shared.py:14-46",
      "severity": "high",
      "priority": "high",
      "effort": "high",
      "impact": 5,
      "confidence": 0.95,
      "evidence": "opts, state, sd_model defined in shared; written in shared_init and processing; read by dozens of modules.",
      "fix_hint": "Pass opts/state snapshot into process_images; introduce execution context for state."
    },
    {
      "id": "CI-001",
      "title": "Add coverage threshold and 3-tier tests",
      "category": "tests_ci",
      "path": ".github/workflows/run_tests.yaml:61",
      "severity": "medium",
      "priority": "high",
      "effort": "medium",
      "impact": 4,
      "confidence": 1.0,
      "evidence": "Single test job; no --cov-fail-under; no smoke/quality/nightly.",
      "fix_hint": "Add smoke step; add --cov-fail-under=(current-2); document 3-tier strategy."
    },
    {
      "id": "SEC-001",
      "title": "Pin GitHub Actions to SHA; add pip-audit",
      "category": "security",
      "path": ".github/workflows/on_pull_request.yaml:14",
      "severity": "medium",
      "priority": "medium",
      "effort": "low",
      "impact": 4,
      "confidence": 1.0,
      "evidence": "Actions use @v4/@v5; no pip-audit or npm audit in CI.",
      "fix_hint": "Replace with actions/checkout@<sha> etc.; add pip install pip-audit && pip-audit."
    },
    {
      "id": "DOC-001",
      "title": "Add CONTRIBUTING and extension API contract",
      "category": "docs",
      "path": "README.md",
      "severity": "low",
      "priority": "high",
      "effort": "low",
      "impact": 3,
      "confidence": 1.0,
      "evidence": "No CONTRIBUTING.md; extension API is code-only, no version.",
      "fix_hint": "Create CONTRIBUTING.md; add EXTENSION_API_VERSION and callback/script doc."
    },
    {
      "id": "EXT-001",
      "title": "Version and document extension callback API",
      "category": "extensions",
      "path": "modules/script_callbacks.py:219-243",
      "severity": "medium",
      "priority": "medium",
      "effort": "medium",
      "impact": 4,
      "confidence": 0.9,
      "evidence": "callback_map and param types exist but are not versioned or documented as contract.",
      "fix_hint": "Add EXTENSION_API_VERSION; publish minimal doc of callbacks and params; deprecation policy."
    }
  ],
  "scores": {
    "architecture": 2.5,
    "modularity": 2,
    "code_health": 2.5,
    "tests_ci": 2,
    "security": 2,
    "performance": 3,
    "dx": 2,
    "docs": 2,
    "extensions": 2.5,
    "overall_weighted": 2.4
  },
  "phases": [
    {
      "name": "Phase 0 — Fix-First & Stabilize",
      "milestones": [
        {
          "id": "P0-1",
          "milestone": "Add smoke test and pin actions to SHA",
          "acceptance": ["Smoke step runs and is required", "Checkout/setup-python use full SHA"],
          "risk": "low",
          "rollback": "Revert workflow",
          "est_hours": 1
        },
        {
          "id": "P0-2",
          "milestone": "Add pip-audit and artifact upload on fail",
          "acceptance": ["pip-audit runs in CI", "Artifacts uploaded when job fails"],
          "risk": "low",
          "rollback": "Remove step",
          "est_hours": 0.5
        }
      ]
    },
    {
      "name": "Phase 1 — Document & Guardrail",
      "milestones": [
        {
          "id": "P1-1",
          "milestone": "CONTRIBUTING.md and pytest markers",
          "acceptance": ["CONTRIBUTING exists", "pytest -m smoke runs subset"],
          "risk": "low",
          "rollback": "Revert",
          "est_hours": 1
        },
        {
          "id": "P1-2",
          "milestone": "Commit package-lock and npm ci",
          "acceptance": ["package-lock.json in repo", "CI uses npm ci"],
          "risk": "low",
          "rollback": "Revert commit and workflow",
          "est_hours": 0.5
        }
      ]
    },
    {
      "name": "Phase 2 — Harden & Enforce",
      "milestones": [
        {
          "id": "P2-1",
          "milestone": "Coverage threshold with 2% margin",
          "acceptance": ["CI fails if coverage below threshold"],
          "risk": "medium",
          "rollback": "Remove --cov-fail-under",
          "est_hours": 1
        }
      ]
    }
  ],
  "dependency_graph": {
    "hub_modules": ["shared", "paths_internal", "processing", "script_callbacks", "options", "ui_components", "sd_samplers", "sd_models", "infotext_utils", "images"],
    "cycles": [],
    "top_imported_modules": ["shared", "paths_internal", "processing", "options", "script_callbacks", "ui_components", "sd_samplers", "sd_models", "infotext_utils", "images", "scripts", "shared_cmd_options", "sd_hijack", "errors", "devices", "extensions", "paths", "upscaler", "util", "sd_vae"]
  },
  "global_state": {
    "variables": ["cmd_opts", "opts", "state", "sd_model", "device", "demo", "hypernetworks", "loaded_hypernetworks", "sd_upscalers", "face_restorers", "prompt_styles", "interrogator", "total_tqdm", "mem_mon"],
    "writers": ["shared_init.py (opts, state)", "processing.py (opts override, state fields)", "options (opts)", "sd_models (sd_model)", "ui.py (demo)", "progress/call_queue (state)"],
    "readers": "Most of modules/ (shared.opts, shared.state, shared.sd_model)"
  },
  "largest_files": [
    {"path": "modules/processing.py", "loc": 1793},
    {"path": "modules/models/diffusion/ddpm_edit.py", "loc": 1236},
    {"path": "modules/ui.py", "loc": 984},
    {"path": "modules/scripts.py", "loc": 790},
    {"path": "modules/api/api.py", "loc": 750}
  ],
  "complexity_hotspots": [
    {"name": "process_images_inner", "file": "modules/processing.py", "rough_complexity": "high"},
    {"name": "StableDiffusionProcessingTxt2Img.sample / sample_hr_pass", "file": "modules/processing.py", "rough_complexity": "high"},
    {"name": "create_ui", "file": "modules/ui.py", "rough_complexity": "high"},
    {"name": "text2imgapi / img2imgapi", "file": "modules/api/api.py", "rough_complexity": "medium"}
  ],
  "metadata": {
    "repo": "AUTOMATIC1111/stable-diffusion-webui",
    "commit": "82a973c04367123ae98bd9abdf80d9eda9b910e2",
    "languages": ["py", "js"],
    "workspace_path": "c:\\coding\\refactoring\\serena"
  }
}

18. Top 10 Highest-Leverage Refactor Targets

Rank Target What it unlocks Track
1 Introduce opts_snapshot and pass into process_images_inner Deterministic runs; testable pipeline; first step to runtime layer Strategic
2 Add ProcessingRunner (or Txt2ImgRunner/Img2ImgRunner) Single entry for API and UI; swap implementation later without touching callers Strategic
3 3-tier CI + coverage gate Fast feedback; coverage regression guard; foundation for all other work Iterative
4 Pin CI actions to SHA + pip-audit Reproducibility and supply-chain safety; low effort Iterative
5 CONTRIBUTING.md + extension API version doc Onboarding and extension stability; unblocks contributors Iterative
6 Model provider interface Unit-test runtime with mock model; decouple from shared.sd_model Strategic
7 temporary_opts context manager in process_images Clean override/restore; smaller blast radius than full snapshot Iterative
8 Extract prepare_prompts_and_seeds Smaller process_images_inner; clearer seam for future runtime extraction Iterative
9 UI tab registry Modular UI; load tabs on demand; easier to add/remove features Strategic
10 Extension callback contract + deprecation policy Safe evolution of script_callbacks; fewer breaking changes for extensions Iterative

End of pre-refactor audit. All sections completed. Use this document as the basis for a 5/5 refactor plan.