mirror of https://github.com/AUTOMATIC1111/stable-diffusion-webui.git synced 2026-03-23 06:40:23 -07:00

Michael Cahill 0a8ade1a9f M00: Program kickoff, baseline freeze, phase map, E2E verification

- docs/serena.md: Living ledger, phase map, invariants, milestone table
- docs/milestones/M00/: M00_plan, preflight, e2e_baseline, ci_inventory, toolcalls
- scripts/dev/: run_m00_baseline_e2e.ps1, .sh (thin verification helpers)
- Baseline tag baseline-pre-refactor created on 82a973c0

No runtime/structural changes. Behavior-preserving docs and verification only.

Made-with: Cursor

2026-03-06 19:17:49 -08:00

52 KiB

Raw Blame History

Pre-Refactor Audit: Stable Diffusion WebUI

Auditor: CodeAuditorGPT (staff-plus, architecture-first)
Repository: AUTOMATIC1111/stable-diffusion-webui
Workspace: c:\coding\refactoring\serena
Commit: 82a973c04367123ae98bd9abdf80d9eda9b910e2
Goal: Produce the best possible pre-refactor audit for a full-repo transformation to a 5/5 score.

All findings are grounded in the codebase with file paths and line ranges. For each major section: Observations = directly evidenced; Inferences = reasoned conclusions; Recommendations = proposed changes.

0. Scoring Rubric (Used Consistently)

Score	Meaning
0	Catastrophic (actively dangerous / unusable)
1	Fragile (frequent breakage, no guardrails)
2	Poor (works, but hard to change safely)
3	Acceptable (works, some guardrails, clear pain points)
4	Strong (well-structured, predictable, maintainable)
5	Exemplary (clear architecture, guardrails, docs, observability)

1. Executive Summary

Overall score: 2.4 / 5

Category	Score	Category	Score
Architecture	2.5	Performance	3
Modularity	2	DX	2
Code health	2.5	Docs	2
Tests & CI	2	Extensions	2.5
Security	2	Overall	2.4

Strengths

Clear entry points (webui.py, launch.py) and a single core package (modules/). Evidence: webui.py:1-24, launch.py delegates to launch_utils.
Rich extension and script callback system (script_callbacks, extensions, scripts) enabling hooks without forking. Evidence: modules/script_callbacks.py:219-243, modules/extensions.py:226-300.
CI runs lint (ruff, eslint) and a full pytest suite against a live server with coverage and artifact upload. Evidence: .github/workflows/on_pull_request.yaml, .github/workflows/run_tests.yaml:61-80.
API and UI both funnel into the same processing pipeline (process_images), so behavior is consistent. Evidence: modules/api/api.py:479-482, modules/txt2img.py:104-108.

Critical weaknesses

Global state hub: shared.opts, shared.state, shared.sd_model are defined in shared.py and written in shared_init.py and processing.py; dozens of modules read them. Testability and determinism suffer. Evidence: modules/shared.py:14-46, modules/shared_init.py:19,46, processing.py:823-833,885-886.
No test tiers or coverage gate: Single test job; no smoke/quality/nightly; no --cov-fail-under. Evidence: run_tests.yaml:58-61.
God modules and tight coupling: processing.py (~1793 LOC), ui.py (~1236 LOC), api/api.py (~929 LOC) import many modules and rely on shared. Evidence: modules/processing.py:18-31, modules/ui.py:16-31.
Dependency and CI hygiene: Mixed pinning in requirements.txt; package-lock.json gitignored; CI uses npm i --ci and action tags (@v4). Evidence: requirements.txt, .gitignore:40, on_pull_request.yaml:36, run_tests.yaml:14.
No CONTRIBUTING or extension API contract: Onboarding and extension stability rely on wiki/tribal knowledge. Evidence: No CONTRIBUTING.md; extension hooks in script_callbacks not versioned.

Architectural posture

Current: Single Gradio/FastAPI app with a large procedural modules/ package; shared and ui act as hubs; processing, API, and UI are intertwined via global state.
Intended (from repo): None explicitly documented; structure suggests “one app, script-style, extend via callbacks.”
One-sentence description: A monolithic Gradio/FastAPI app whose core is a single modules package with shared global state, a central processing pipeline, and a callback-based extension system.

2. Architecture & System Map

Text-based architecture map

Entrypoints
- launch.py: Parses args, prepares environment, calls launch_utils.start() → webui.start(). Evidence: launch.py:25-43, modules/launch_utils.py.
- webui.py: Imports timer/initialize, exposes create_api() and webui(); initialize.initialize() loads options and model state. Evidence: webui.py:1-50, modules/initialize.py.
Core packages
- modules/: Core logic (processing, models, samplers, UI, API, extensions, paths, options). Evidence: Directory layout; 150+ Python files.
- extensions-builtin/: Lora, LDSR, SwinIR, etc.; loaded via extensions.list_extensions(), scripts via script_loading. Evidence: modules/extensions.py:226-300, modules/script_loading.py:10-16.
- scripts/: Built-in scripts (xyz_grid, outpainting, etc.); discovered and run via modules.scripts. Evidence: scripts/xyz_grid.py:15-18, modules/scripts.py.
Surfaces
- API: FastAPI routes under /sdapi/v1/*; handlers in modules/api/api.py build StableDiffusionProcessing* and call process_images(p). Evidence: modules/api/api.py:211-251,432-490.
- UI: Gradio built in modules/ui.py; tabs and controls call into txt2img.py, img2img.py, which create p and call scripts.run / process_images. Evidence: modules/ui.py:16-31, modules/txt2img.py:19-55,101-108.
- Runtime: No separate “runtime” package; generation lives inside processing.py and sampler modules.
- Extension surface: Extensions register callbacks via script_callbacks.add_callback; scripts extend scripts.Script and are loaded from scripts/ and extension dirs. Evidence: modules/script_callbacks.py:127-147, modules/scripts.py:51-120.

Layers as they actually exist

Entry / bootstrap: launch.py, webui.py, initialize.py, shared_init.py.
Configuration / CLI: shared_cmd_options, cmd_args, options, shared_options → populate shared.opts and cmd_opts.
Global state: shared.py (opts, state, sd_model, device, etc.), shared_state.State.
Orchestration: processing.process_images → process_images_inner; scripts run before/after via p.scripts.
Model/sampler: sd_models, sd_samplers, sd_vae, sd_hijack*; LDM/diffusion in modules/models/.
UI / API: ui.py, api/api.py, txt2img.py, img2img.py — all depend on shared and processing.

Hub modules

shared.py: Defines and re-exports cmd_opts, opts, state, sd_model, device, and many other globals; read by almost every feature module. Evidence: modules/shared.py:14-95.
ui.py: Builds the Gradio UI; imports script_callbacks, sd_models, processing, ui_*, shared; central for all UI tabs. Evidence: modules/ui.py:16-31.

Cross-cutting concerns

Logging: Standard logging; modules/logging_config.py; no structured/observability stack observed.
Config: options.Options in shared.opts; loaded/saved via shared_options and UI; overrides applied in process_images. Evidence: modules/options.py, processing.py:823-833.
State: shared_state.State (job, interrupted, sampling_step, etc.); mutated in processing, API, call_queue, progress. Evidence: modules/shared_state.py:11-80, grep of state. across modules.
Error handling: modules/errors.report(); callbacks wrapped with try/except in script_callbacks. Evidence: modules/script_callbacks.py:15-16,253-259.

Drift analysis

The repo does not claim a “clean layered” architecture. Observation: Layers are implicit (bootstrap → config → state → orchestration → model → UI/API). Drift: Orchestration and model code are mixed in processing.py; UI and API both depend directly on shared and processing with no abstraction layer. To reach a clean layered design would require extracting a runtime layer (generation pipeline with explicit inputs/outputs) and dependency injection for opts/state/model.

Score: architecture 2.5 / 5

3. Runtime Pipeline Analysis

End-to-end generation pipelines

txt2img

Request handling: API: api.text2imgapi(txt2imgreq) builds StableDiffusionProcessingTxt2Img from request, sets p.script_args, then scripts.scripts_txt2img.run(p, *p.script_args) or process_images(p). UI: txt2img_create_processing() builds p from Gradio args, then scripts.scripts_txt2img.run(p, *p.script_args) or process_images(p). Evidence: modules/api/api.py:432-490, modules/txt2img.py:14-55,101-108.
Processing: process_images(p) applies override_settings to opts, reloads model/VAE if needed, then process_images_inner(p). Evidence: modules/processing.py:819-858.
Inner loop: process_images_inner(p) fixes seed, sets job_count, calls p.init() then for each iteration p.sample() (which creates sampler, runs sampler.sample(...), optionally hires pass). Evidence: modules/processing.py:863-934,1307-1371.
Sampler: sd_samplers.create_sampler(p.sampler_name, p.sd_model); sampler’s sample(p, x, conditioning, unconditional_conditioning, ...) produces latents; then decode_first_stage (or batch decode) and image save. Evidence: modules/processing.py:1307-1345, modules/sd_samplers_common.py:73, modules/sd_samplers_kdiffusion.py:190.
Model loading: shared.sd_model is set by sd_models.reload_model_weights(); used inside process_images and in sampler. Evidence: processing.py:828-830,885-886, modules/sd_models.py.

img2img / inpainting

Same orchestration: API or UI builds StableDiffusionProcessingImg2Img (with init_image, mask, etc.), then process_images(p). p.init() and p.sample() are overridden in img2img subclass; init latent comes from VAE encode of image. Evidence: modules/img2img.py:10-17, modules/processing.py (img2img subclass).

Orchestration

Orchestration layer: Effectively process_images + process_images_inner + p.init() / p.sample(). Scripts hook via p.scripts.before_process, process, process_before_every_sampling. Evidence: processing.py:819-821,912-914,1336-1343.
Sampler orchestration: One sampler per p; created inside p.sample() (e.g. sd_samplers.create_sampler(self.sampler_name, self.sd_model)). Evidence: processing.py:1307-1308,1384.
Model loading and selection: sd_models.reload_model_weights() / get_closet_checkpoint_match; override in p.override_settings['sd_model_checkpoint']. Evidence: processing.py:828-836, modules/sd_models.py.
Seed handling: get_fixed_seed(p.seed); p.all_seeds/p.all_subseeds set in process_images_inner; p.rng used in sample. Evidence: processing.py:871-907, processing.py:1335,1759-1760.
Batching: p.n_iter outer iterations; p.batch_size per iteration; loop in process_images_inner over batches. Evidence: processing.py:929-934 and following.

Control flow

Tangled/duplicated: Override application and model/VAE reload are in process_images; seed/prompt setup in process_images_inner; script hooks at multiple points. Some logic (e.g. hires) is in StableDiffusionProcessingTxt2Img.sample and sample_hr_pass (large methods). Evidence: processing.py:819-858,863-934,1307-1393.
Seams for a “runtime” layer: (1) Everything after p.init() and before image save could be a pure function run_sampling(p, sampler, model, rng). (2) Override application could be a function that returns an opts snapshot and restores it. (3) Script hooks could be a formal pipeline stage interface.

Reproducibility

Exact inputs for reproducible output: Seed(s), subseed, subseed_strength, prompt, negative_prompt, sampler, steps, cfg_scale, dimensions, model (checkpoint), VAE, and all options that affect sampling (e.g. clip_skip). Override_settings applied in process_images mutate opts for the duration of the run. Evidence: processing.py:823-833,871-907, StableDiffusionProcessing dataclass fields.
Inherent vs avoidable nondeterminism: Inherent: none if seed and hardware are fixed. Avoidable: (1) opts and state are global, so concurrent or re-entrant calls can interfere. (2) Model/VAE loaded from shared so any change elsewhere affects the run. Passing opts/state/model explicitly would make runs deterministic given the same inputs.

4. Global State & State Model

Global state inventory

Variable	Definition	Writers	Readers (representative)
`shared.cmd_opts`	`shared_cmd_options.cmd_opts`	Parsed at startup	Many (paths, options, extensions, api)
`shared.opts`	`options.Options(...)` in shared_init	`shared_init.py:19`; `opts.set()` in processing, options UI	processing, api, ui, sd_models, sd_samplers, images, etc.
`shared.state`	`shared_state.State()` in shared_init	`shared_init.py:46`; `state.begin()`, `.skip()`, `.interrupt()`, job_count/sampling_step in processing, progress, api	processing, progress, api, ui_toprow, call_queue, sd_samplers_cfg_denoiser
`shared.sd_model`	`shared.py:46`	sd_models (load/unload)	processing, api, ui, sd_samplers, sd_hijack, etc.
`shared.device`	`shared.py:25`	initialization	processing, models, samplers
`shared.demo`	`shared.py:23`	ui.py (create_ui)	webui, ui
`shared.hypernetworks`, `loaded_hypernetworks`	`shared.py:31-33`	hypernetwork loading	sd_hijack, api
`shared.sd_upscalers`	`shared.py:63`	upscaler registration	api, extras
`shared.face_restorers`	`shared.py:41`	face_restoration_utils	api, processing
`shared.prompt_styles`, `interrogator`, `total_tqdm`, `mem_mon`	`shared.py:37-39,71,73,74`	ui/init / progress	ui, progress, etc.

State mutation map (who mutates what)

opts: Set at startup from config; mutated in process_images for override_settings; restored in finally if override_settings_restore_afterwards; also mutated by options UI. Evidence: processing.py:823-833,851-854, modules/options.py.
state: state.begin(job=...) at API/UI entry; state.job_count, state.sampling_step, state.current_image, etc. set during processing; state.interrupt(), state.skip() from API. Evidence: modules/shared_state.py, processing.py:927-928, api/api.py:475.
sd_model: Loaded/unloaded by sd_models.reload_model_weights(), called from processing and API. Evidence: modules/sd_models.py, processing.py:828-836.

Classification

Configuration: cmd_opts, opts (with override_settings applied per run).
Runtime execution: state (job, interrupted, sampling_step, current_image, etc.).
Model registry: sd_model, clip_model, sd_upscalers, face_restorers, hypernetworks, loaded_hypernetworks.
UI/session: demo, settings_components, tab_names, gradio_theme, prompt_styles.
Extension-owned: Extensions register callbacks and scripts; extension list in extensions.extensions; no single “extension state” object.

Testability impact: Unit-testing any code that reads shared.opts or shared.state or shared.sd_model requires patching globals or starting the full app. Determinism impact: Concurrent or sequential runs can affect each other via shared opts/state/model. Extension impact: Extensions that read or mutate shared are tied to the current layout; any refactor of shared state can break them.

Score: modularity 2 / 5 (reflects global-state risk)

5. Dependency Graph & Coupling

Top 20 most imported modules (by number of files importing)
(Derived from grep of from modules.* import / import modules.* in repo.)

shared / modules.shared
paths_internal (paths, script_path, models_path, etc.)
processing (Processed, process_images, StableDiffusionProcessing*)
options / OptionInfo, options_section
script_callbacks
ui_components
sd_samplers / sd_models
infotext_utils
images
scripts
shared_cmd_options / cmd_opts
sd_hijack / model_hijack
errors
devices
extensions
paths
upscaler / Upscaler, UpscalerData
util
sd_vae
ui_common

Top 10 hub modules (inbound references)

shared — re-exports and global state; used by almost every feature module.
paths_internal — paths used by options, shared, extensions, config, images.
processing — API, UI, scripts all call process_images and use Processed.
script_callbacks — samplers, scripts, extensions register and call callbacks.
options / shared_options — UI and shared depend on OptionInfo/options_section.
ui_components — ui_*, scripts use FormRow, ToolButton, etc.
sd_samplers / sd_models — processing, api, scripts, ui.
infotext_utils — ui, processing, api, scripts.
images — ui, processing, api, extras.
scripts — ui, api, txt2img, img2img, extensions.

Cyclic dependencies

No strict import cycles detected at module level (Python would fail to load). Observation: shared imports shared_cmd_options, options, shared_items, etc.; those do not import shared at top level (some use it at runtime). So no cycle in the static graph. Inference: Cycles could appear at runtime (e.g. script_callbacks → shared → options → …). Not fully traced here.

God modules

ui.py: ~984 LOC; imports 16+ modules; builds entire Gradio UI. Evidence: modules/ui.py:16-31, file size.
processing.py: ~1793 LOC; imports 15+ modules; contains processing classes and the full sampling loop. Evidence: modules/processing.py:18-31, line count.
api/api.py: ~929 LOC; many routes and handlers; imports shared, processing, scripts, sd_models, etc. Evidence: modules/api/api.py:19-34, file size.

God functions

process_images_inner — long loop, seed/prompt setup, batch iteration, script hooks. Evidence: processing.py:863-~1100+.
StableDiffusionProcessingTxt2Img.sample and sample_hr_pass — large methods with hires and decode logic. Evidence: processing.py:1307-1393.

Per major module (summary)

shared: Inbound: almost all; outbound: shared_cmd_options, options, paths_internal, util, shared_items, shared_gradio_themes. Reliance on global state: is the state holder.
processing: Inbound: api, img2img, txt2img, scripts (many). Outbound: shared, sd_models, sd_samplers, sd_vae, devices, scripts, images, etc. Heavy reliance on shared.opts, shared.state, shared.sd_model.
api/api: Inbound: webui (create_api). Outbound: shared, processing, scripts, sd_models, images, progress, etc. Reliance on shared and process_images.

Import centrality vs runtime criticality: shared is central both in imports and at runtime (opts/state/sd_model). processing is runtime-critical and highly imported. paths_internal is central for imports but less “hot” at runtime.

Surgical decouplings (3–5, PR-sized)

Pass opts snapshot into process_images: Add a helper that builds a dict or small struct from opts (and override_settings) and pass it into a new process_images_with_opts(p, opts_snapshot) used by one API endpoint first; keep reading from snapshot instead of global inside that path. Evidence to address: processing.py:823-833.
Extract “sampler runner”: Move the call self.sampler.sample(self, x, conditioning, ...) and the immediate decode into a function run_sampler_step(p, sampler, x, conditioning, uc, image_cond) in a new module; call it from StableDiffusionProcessingTxt2Img.sample. Reduces god-method size and gives a seam for testing. Evidence: processing.py:1345.
UI tab registry: Replace the single ui.create_ui() with a list of “tab builders”; each tab is a function that returns (name, blocks). Register txt2img, img2img, settings, etc. from their modules. One PR: move one tab into a function and register it. Evidence: modules/ui.py (single create_ui).
API handler → processing adapter: Introduce Txt2ImgRunner.run(request) -> Processed that builds p, calls process_images(p), returns Processed; have text2imgapi call Txt2ImgRunner.run(txt2imgreq). Keeps API thin and gives a single place to swap implementation later. Evidence: api/api.py:432-490.
Extension callback types: In script_callbacks, add a small module that defines dataclasses or protocols for each callback param (e.g. ImageSaveParams already exists). Document and version the callback signatures; add a “supported callback API version” constant. Evidence: script_callbacks.py:19-109,219-243.

Score: modularity 2 / 5

6. Code Health & Maintainability

File size distribution (top 20 by LOC)

Path	LOC
modules/processing.py	1793
modules/models/diffusion/ddpm_edit.py	1236
modules/ui.py	984
modules/scripts.py	790
modules/models/diffusion/uni_pc/uni_pc.py	752
modules/sd_models.py	750
modules/api/api.py	750
modules/images.py	673
modules/deepbooru_model.py	668
modules/ui_extra_networks.py	662
scripts/xyz_grid.py	643
modules/hypernetworks/hypernetwork.py	633
modules/textual_inversion/textual_inversion.py	564
modules/ui_extensions.py	544
modules/models/sd3/mmdit.py	528
modules/sd_hijack_optimizations.py	501
modules/script_callbacks.py	437
modules/models/sd3/other_impls.py	417
modules/infotext_utils.py	400
modules/shared_options.py	385

Complexity hotspots (top functions by scope and branches)

process_images_inner — long loop, many branches, script hooks. Evidence: processing.py:863-~1100.
StableDiffusionProcessingTxt2Img.sample / sample_hr_pass — hires logic, decode paths. Evidence: processing.py:1307-1393.
ui.create_ui — builds all tabs and controls. Evidence: modules/ui.py (single large function/flow).
Sampler sample methods (e.g. k-diffusion, timesteps) — steps, conditioning. Evidence: sd_samplers_kdiffusion.py:190, sd_samplers_timesteps.py:141.
api.text2imgapi / img2imgapi — request parsing, script args, process_images. Evidence: api/api.py:432-565.

Lint configuration

Ruff: pyproject.toml: select B, C, I, W; ignore E501, E721, E731, I001, C901, C408, W605; per-file ignore E402 in webui.py. Evidence: pyproject.toml:1-35.
Pylint: .pylintrc disables C, R, W, E, I. Evidence: .pylintrc:2-3.
Observation: Line length and complexity (C901) are ignored; many long files and long functions.

Anti-patterns

Broad imports: from modules import shared then use of shared.opts, shared.state everywhere. Evidence: grep results across modules.
Re-exports: shared.py re-exports cmd_opts, OptionInfo, natural_sort_key, list_checkpoint_tiles, etc. Evidence: shared.py:75-95.
Dynamic imports: script_loading.load_module(path) for extensions; scripts loaded by importlib. Evidence: script_loading.py:10-16, extensions.py (preload).
Broad except: Callbacks wrapped with try/except that report and continue. Evidence: script_callbacks.py:254-259.

Dead code / unused abstractions

batch_cond_uncond in shared (“old field, unused now”). Evidence: shared.py:17.
No automated dead-code analysis run; inference: large files likely contain legacy or redundant paths.

Score: code_health 2.5 / 5

7. Tests, CI/CD & Reproducibility

Test pyramid

Unit: Almost none; a few tests in test_torch_utils.py, test_utils.py (e.g. parametrized URL/float checks). Evidence: test/test_torch_utils.py, test/test_utils.py.
Integration: Majority: tests start the app (via launch.py --test-server), then pytest hits HTTP endpoints (e.g. /sdapi/v1/txt2img). Evidence: test/test_txt2img.py:42-43, conftest.py:34-36, run_tests.yaml:44-61.
E2E: Same as integration (server + HTTP); no separate E2E layer.

Coverage

Collected: coverage run for server, pytest --cov . --cov-report=xml. Evidence: run_tests.yaml:46-61,65-69.
No --cov-fail-under or threshold in config. Evidence: grep for cov-fail-under / fail_under: none.

Flakiness risks

Server startup: wait-for-it --service 127.0.0.1:7860 -t 20; if startup is slow or port in use, tests fail. Evidence: run_tests.yaml:58-59.
Single job: server and pytest in one job; no retries or separate smoke step.

CI job structure

Lint: ruff (Python), eslint (JS); on push/PR. Evidence: on_pull_request.yaml.
Tests: one job “tests on CPU with empty model”; install deps, launch server in background, pytest, upload artifacts. Evidence: run_tests.yaml.
Branch policy: warns_merge_master.yml fails PRs targeting master. Evidence: warns_merge_master.yml:9-12.

Reproducibility

Python: requirements.txt mixed pins; requirements_versions.txt has more pins; CI uses requirements-test.txt + launch.py with TORCH_INDEX_URL for CPU. No single lockfile. Evidence: requirements.txt, requirements_versions.txt, run_tests.yaml:29-40.
JS: package-lock.json in .gitignore; CI uses npm i --ci. Evidence: .gitignore:40, on_pull_request.yaml:36.
Models: CI caches models with key 2023-12-30; tests run with “empty model” (no download in test flow). Evidence: run_tests.yaml:24-28.

Action pinning

Uses tags: actions/checkout@v4, actions/setup-python@v5, actions/cache@v4, actions/upload-artifact@v4. Not SHA-pinned. Evidence: on_pull_request.yaml:14,15, run_tests.yaml:14,25,71,78.

3-tier test strategy (recommended)

Tier 1 (smoke): Single health or minimal txt2img request; run first; required; low threshold (e.g. 5% coverage or none). Acceptance: Job completes in <2 min; required on PR.
Tier 2 (quality): Full test suite; coverage gate with ≥2% margin below current; required. Acceptance: All tests pass; coverage above threshold.
Tier 3 (nightly): Same suite + optional extras; non-blocking; alert on failure. Acceptance: Runs on schedule; artifacts and report.

Coverage threshold plan

Measure current coverage (e.g. coverage report -i after one run). Set --cov-fail-under=X where X = current − 2%. Enforce in Tier 2.

Reproducible environment plan

Single locked manifest for CI: e.g. generate requirements-ci.txt from current env with pins; use in CI. Commit package-lock.json and use npm ci for JS. Document model expectations (empty for CI; optional cache key for reproducibility).

Score: tests_ci 2 / 5

8. Security & Supply Chain

Dependency pinning

Observation: requirements.txt has mixed: some == (gradio, protobuf, transformers), some >= (fastapi). requirements_versions.txt pins many. No single source of truth for CI. Evidence: requirements.txt, requirements_versions.txt.
Inference: Supply-chain and build reproducibility are at risk without a single locked manifest.

Vulnerability exposure

No pip-audit or npm audit in CI. Evidence: Grep: no pip-audit/npm audit in workflows.
Known sensitive deps: protobuf==3.20.0 (historical CVE; 3.20.x had fixes); versions in repo may have known issues. Recommend running pip-audit and npm audit to get current list.

Secret handling

API auth uses secrets.compare_digest for HTTP basic. Evidence: modules/api/api.py:17 (import). No secrets in repo observed; no dedicated secret scan in CI.

CI trust boundaries

Workflows use checkout, setup-python, setup-node, cache, upload-artifact. Evidence: workflow files.
Recommendation: Pin all actions to full SHA to avoid action supply-chain risk.

SBOM

No SBOM or dependency export found in repo or workflows.

Recommendations

Add pip-audit (and optionally npm audit) as a CI step; fail or warn on known vulns.
Pin GitHub Actions to immutable SHAs.
Use locked manifests: one for Python (CI), commit and use package-lock.json with npm ci.

Score: security 2 / 5

9. Performance & Scalability

Hot paths

processing.py: process_images_inner, p.sample(), sampler sample(), decode_first_stage / batch decode. Evidence: processing.py:863-934, sd_samplers_common.py:73.
Model forward: Inside sampler and LDM/diffusion models. Evidence: modules/models/diffusion/, sd_samplers_*.py.

Model loading and caching

Models loaded via sd_models.reload_model_weights(); kept in shared.sd_model. VAE similarly. Evidence: modules/sd_models.py, modules/sd_vae.py.
Caching: diskcache in requirements; modules/cache.py used for extension git info. Evidence: requirements.txt, modules/cache.py, extensions.py:146.

Queueing

Gradio queue: shared.demo.queue(64). Evidence: webui.py:69.
API: queue lock in call_queue; wrap_gradio_gpu_call etc. Evidence: modules/call_queue.py, api/api.py (task_id, start_task, finish_task).

Performance risks

Repeated I/O: model load on first request; embedding reload when not disabled. Evidence: processing.py:909-910 (embedding load).
Unnecessary recomputation: no obvious redundant forward passes; some options (e.g. live preview) add work. Evidence: processing.py:923-924.

Profiling plan

Run a single txt2img request with python -m cProfile -o trace.stats (or PyTorch profiler) and inspect hotspots in process_images_inner and sampler.
Add a lightweight /sdapi/v1/health or /sdapi/v1/timing that returns startup time and (if stored) last-request latency for smoke and monitoring.
Optionally: small load script (e.g. 10 sequential txt2img) to measure P95 latency.

Performance budget proposal

Not stated in repo. Recommendation: If performance is a goal, define e.g. “P95 txt2img (N steps) < X s on CPU test config” and “startup < Y s”; measure in CI or nightly and alert on regression.

Score: performance 3 / 5

10. Developer Experience (DX)

15-minute new-dev journey

Steps: Clone → install Python 3.10.x (and Node for lint) → run webui-user.bat or webui.sh (first run installs deps) → run ruff . and npm run lint → run tests (start server in background, then pytest test/). Evidence: README, workflow files.
Blockers: No single “run tests” script; CONTRIBUTING missing; lockfile gitignored so npm ci not possible; tests require full server.

Local test workflow

Lint: ruff . (Python), npm run lint (JS). Evidence: package.json, pyproject.toml.
Tests: Start server (launch.py --skip-torch-cuda-test --test-server ...), then pytest test/ (or pytest test/test_txt2img.py -v). Evidence: run_tests.yaml:44-61, conftest.py.
Single test: pytest test/test_txt2img.py::test_txt2img_simple_performed -v (with server running).

CONTRIBUTING

Observation: No CONTRIBUTING.md in repo. Evidence: No file found.
Recommendation: Add CONTRIBUTING.md with lint commands, test commands, branch policy (e.g. PR to dev), and link to extension docs.

Extension developer experience

Observation: Extension authors learn from wiki and by reading script_callbacks, scripts.Script, and built-in extensions. No single “Extension API” doc in repo. Evidence: CODEOWNERS comment about localizations and extensions wiki.
Recommendation: Document callback list and signatures, script lifecycle, and “supported API version”; provide a minimal extension template and test approach (e.g. run with one extension enabled).

Score: dx 2 / 5

11. Documentation

README

Observation: Installation (Windows/Linux), features, running, limitations (e.g. Python 3.10.6). Evidence: README.md:94-120, feature list.
Gaps: No “Development” or “Contributing” section; no local test/lint steps.

CONTRIBUTING

Observation: Absent. Evidence: No CONTRIBUTING.md.

Architecture docs

Observation: No ADRs or architecture diagrams in repo. Evidence: No docs in repo root or docs/.

Extension API docs

Observation: Callback names and param types exist in code (script_callbacks.py); no explicit “contract” doc or versioning. Evidence: script_callbacks.py:19-109,219-243.
Inference: Extension API is tribal knowledge plus code inspection.

Score: docs 2 / 5

12. Extension Ecosystem Stability

Extension loading

Discovery: list_extensions() scans extensions_builtin_dir and extensions_dir; builds Extension with ExtensionMetadata from metadata.ini. Evidence: extensions.py:226-300.
Import: Scripts under extension dirs loaded via script_loading.load_module() (e.g. preload.py); scripts list from extension.list_files('scripts', '.py'). Evidence: script_loading.py:10-16, extensions.py:178-189.
Lifecycle: Extensions listed at startup; enabled/disabled via opts; callbacks registered when scripts load. Evidence: extensions.active(), shared.opts.disabled_extensions.

Extension API surface

Hooks/callbacks: script_callbacks.callback_map (app_started, model_loaded, ui_tabs, before_image_saved, cfg_denoiser, etc.). Evidence: script_callbacks.py:219-243.
Stability: No version field in callback API; params are dataclasses (e.g. ImageSaveParams). Adding or changing params can break extensions. Evidence: script_callbacks.py:19-109.

Backwards compatibility risks

Extensions import modules.* (e.g. modules.ui_components, modules.scripts, modules.processing, modules.shared). Any rename or move of these breaks them. Evidence: extensions-builtin/Lora/network_lora.py:4, extensions-builtin/soft-inpainting/scripts/soft_inpainting.py:4-6.
Classification: Internal-but-relied-upon: modules.shared, modules.scripts, modules.processing, modules.ui_components, modules.paths_internal, modules.script_callbacks. Semi-private: callback param types (used by extensions but not clearly versioned). Stable: Only the existence of callback names and the Script base class; no formal stability guarantee.

Governance gaps

No extension API versioning; no deprecation policy; no compatibility matrix (e.g. “extensions built for API v1”).

Recommendations

Extension API contract: Publish a minimal doc listing callback names, param types, and “contract version” (e.g. 1.0); state that new fields may be added but existing ones will not be removed for that version.
Versioning: Add EXTENSION_API_VERSION = "1.0" and document what it covers; bump when breaking callback or Script interface changes.
Deprecation path: For breaking changes, add new callbacks or params, deprecate old ones with a comment and log warning, remove in next major version.

Score: extensions 2.5 / 5

13. Target Architecture Definition (What 5/5 Looks Like)

Clear separation

Runtime (generation pipelines): A dedicated package or module that takes (prompt, negative_prompt, sampler_name, steps, seed, model_ref, opts_snapshot, …) and returns (images, infotext). No global shared.opts or shared.sd_model inside this layer; model and sampler are injected or resolved from a registry interface.
API: HTTP layer that maps requests to runtime inputs and runtime outputs to responses; uses a runner/adapter that calls the runtime with explicit parameters.
UI: Gradio (or other) that builds controls and calls the same runner or runtime via a thin adapter; no direct access to shared.sd_model or processing internals for generation.
Extension system: Documented callback and Script API with a version; extensions register with a stated contract; core does not depend on extension internals.

Explicit dependency injection

Models: Runtime receives a “model provider” or “checkpoint loader” interface; API/UI obtain it from a registry (which may still wrap sd_models) and pass it in.
Samplers: Sampler creation behind an interface; runtime gets a sampler for the current model and step config.
Configuration: Options passed as a snapshot (or immutable view) into the runtime; no opts.set() inside the core pipeline.

No critical global state in hot paths

Generation path uses only explicit arguments and injected dependencies; state (job, interrupted) can remain for progress/cancellation if accessed via a narrow interface (e.g. “execution context”) rather than raw global.

Deterministic artifact outputs

Same (seed, prompt, opts_snapshot, model version) → same output; runtime is pure modulo RNG and model weights.

Reproducible CI

Pinned Python deps (lockfile or single requirements-ci.txt); committed package-lock.json and npm ci; SHA-pinned GitHub Actions; 3-tier tests with coverage gate and ≥2% margin.

Stable extension API

Documented callback and Script contract; version number; deprecation policy (new optional params allowed; removal only with version bump and notice).

14. Refactorability & Extraction Analysis

Architectural fault lines

Runtime vs rest: Boundary = “everything needed to produce images from (prompt, seed, opts, model, sampler).” Cut at: (1) entry to process_images_inner (caller supplies opts snapshot and model reference), (2) exit after Processed is built. Evidence: processing.py:863-858.
API vs shared: Boundary = API handlers should not read/write shared except via a narrow facade (e.g. “get current model,” “apply overrides”). Cut at: replace direct opts/sd_models usage in api.py with calls to an adapter. Evidence: api/api.py:471-472, opts.outdir_*.
UI vs processing: Boundary = UI should only build p and call a single entry point (e.g. run_txt2img(p) or script runner). Cut at: txt2img() / img2img() in txt2img.py/img2img.py already call process_images(p); further cut = move creation of p into an adapter that takes “request” and returns Processed.

Safe extraction seams

Seed/prompt setup: Logic in process_images_inner that sets p.all_seeds, p.all_prompts could move to a prepare_prompts_and_seeds(p) function in the same file. Evidence: processing.py:871-907.
Override apply/restore: The block in process_images that applies override_settings and restores in finally could be a context manager with temporary_opts(override_settings): .... Evidence: processing.py:823-857.
Script callbacks (params): script_callbacks already uses dataclasses; moving them to a callback_params.py (or keeping and documenting) is a small, safe move. Evidence: script_callbacks.py:19-109.

Minimal architectural cuts

Extract runtime layer: (1) Introduce runtime.run_txt2img(p, opts_snapshot, model_provider) that does not read shared.opts/shared.sd_model inside; call it from process_images with snapshot and current model. (2) Gradually move logic from process_images_inner into runtime and pass opts/model explicitly.
Decouple UI from processing: (1) Keep UI building p and calling scripts.run / process_images; (2) Introduce ProcessingRunner.run_txt2img(args) that returns Processed; UI and API both call the runner. No need to change UI internals in the first cut.
Decouple API from shared: (1) API builds p and calls a runner that takes p and (optionally) opts_snapshot; (2) Runner uses snapshot for paths/options instead of opts global; (3) Model still from registry/facade until a later phase.

Recommended order of extractions

Phase 0 (stabilize): Pin CI actions to SHA; add smoke test; add pip-audit; commit package-lock and use npm ci. No architectural change.
Phase 1 (seams): Add CONTRIBUTING; document extension callback API version; add temporary_opts (or equivalent) and use it in process_images; add pytest markers for smoke.
Phase 2 (runtime boundary): Introduce opts_snapshot type and build it in process_images from opts + override_settings; pass snapshot into process_images_inner and refactor inner to read from snapshot where possible (leave state and model for later).
Phase 3 (runner): Add Txt2ImgRunner / Img2ImgRunner (or single ProcessingRunner) that builds p, applies overrides, calls process_images, returns Processed; switch API and then UI to use runner.
Phase 4 (model injection): Introduce a model-provider interface; runtime gets model from provider instead of shared.sd_model; registry implementation wraps current sd_models. Then option to run tests with a mock provider.
Phase 5 (UI registry): Replace monolithic create_ui with a list of tab builders; move one tab at a time into a builder and register.

15. Refactor Strategy (Goal: 5/5)

Option A — Iterative (low blast radius)

PR-sized steps, each ≤60 minutes; reversible.
Focus: CI guardrails, test tiers, pinning, small decouplings.

Phases

Phase 0 — Fix-first & stabilize (0–1 day): Add smoke test (one health or txt2img); pin checkout/setup-python to SHA; add pip-audit step; upload artifacts on fail. Risks: Low. Rollback: Revert workflow changes.
Phase 1 — Document & guardrail (1–3 days): CONTRIBUTING.md; pytest markers (smoke); explicit test path in CI; pin Ruff/pytest in requirements-test; commit package-lock, use npm ci. Risks: Low. Rollback: Revert doc and workflow.
Phase 2 — Harden (3–7 days): Add --cov-fail-under with 2% margin; make smoke required; add “quality” job or ordered steps. Risks: Medium (coverage may fluctuate). Rollback: Remove threshold.
Phase 3 — Small decouplings (ongoing): temporary_opts context manager; prepare_prompts_and_seeds extraction; one API endpoint via Txt2ImgRunner; extension API version constant + doc. Risks: Low per PR. Rollback: Revert individual PRs.

Milestone labels: Phase 0–1 = foundational; Phase 2 = hardening; Phase 3 = enabling (enables later architectural work).

Option B — Strategic (structural)

Introduce runtime/service layer: Extract generation into a module that accepts opts_snapshot and model provider; move sampling loop and decode there.
Decouple shared.py: Pass option/state snapshots into processing; introduce “execution context” for state if needed; reduce direct shared reads in hot path.
Modularize UI: Tab registry; one tab per module; lazy or explicit registration.
ProcessingRunner: API and UI call a runner that builds p, applies overrides, calls runtime, returns Processed.
3-tier CI with coverage gates: Smoke (required), quality (required, coverage threshold), nightly (optional, alert).
Deterministic environment: Locked Python manifest for CI; npm ci; document model handling.