diff --git a/README_ROCM.md b/README_ROCM.md new file mode 100644 index 000000000..33c566d0a --- /dev/null +++ b/README_ROCM.md @@ -0,0 +1,295 @@ +# ROCm Setup Guide for Stable Diffusion WebUI + +This guide helps you set up and optimize Stable Diffusion WebUI for AMD GPUs using ROCm 6.2. + +## Quick Start + +### 1. Copy the Optimized Launch Configuration + +```bash +cp webui-user-rocm62.sh webui-user.sh +``` + +### 2. Launch the WebUI + +```bash +./webui.sh +``` + +The launch script will automatically: +- Install PyTorch with ROCm 6.2 support +- Configure optimal VRAM settings for 16GB GPUs +- Set up memory management to prevent fragmentation + +### 3. Configure WebUI Settings + +After the WebUI starts, navigate to **Settings → Optimizations** and configure: + +- **Cross attention optimization:** `Doggettx` (default) +- **Enable quantization in K samplers:** ✓ Enabled +- **Token merging ratio:** `0.5` + +See the [VRAM Optimization Guide](ROCM_VRAM_OPTIMIZATION.md) for detailed configuration instructions. + +--- + +## System Requirements + +### Supported AMD GPUs + +- **RX 6000 Series** (Navi 2): RX 6700 XT, 6800, 6800 XT, 6900 XT +- **RX 7000 Series** (Navi 3): RX 7600, 7700 XT, 7800 XT, 7900 XT, 7900 XTX +- **RX 5000 Series** (Navi 1): RX 5700 XT (with limitations) + +### Recommended VRAM + +- **Minimum:** 8GB VRAM +- **Recommended:** 16GB VRAM +- **Optimal:** 24GB VRAM + +### Software Requirements + +- **ROCm:** 6.2 or newer +- **Python:** 3.10 or 3.11 +- **Linux:** Ubuntu 22.04, Fedora 38+, or Arch Linux + +--- + +## Installation + +### Option 1: Automatic Setup (Recommended) + +The `webui.sh` script automatically detects AMD GPUs and installs ROCm 6.2 support. + +```bash +# Clone the repository (if not already done) +git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git +cd stable-diffusion-webui + +# Copy the optimized configuration +cp webui-user-rocm62.sh webui-user.sh + +# Launch (will install dependencies automatically) +./webui.sh +``` + +### Option 2: Manual Setup + +If you need manual control over the installation: + +```bash +# Create and activate virtual environment +python3 -m venv venv +source venv/bin/activate + +# Install PyTorch with ROCm 6.2 +pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2 + +# Install Stable Diffusion WebUI requirements +pip install -r requirements_versions.txt + +# Set environment variables +export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True + +# Launch with optimized flags +python launch.py --skip-torch-cuda-test --medvram --opt-split-attention --no-half-vae +``` + +--- + +## Configuration Files + +### `webui-user-rocm62.sh` + +Pre-configured launch script with optimal settings for AMD GPUs with 16GB VRAM. + +**Key settings:** +- PyTorch with ROCm 6.2 +- Memory fragmentation prevention +- VRAM-optimized command-line arguments + +### `ROCM_VRAM_OPTIMIZATION.md` + +Comprehensive guide covering: +- WebUI settings optimization +- Generation settings for different VRAM amounts +- ControlNet optimization +- Workflows for best quality +- Troubleshooting common issues + +--- + +## Command-Line Arguments Explained + +The optimized configuration uses these flags: + +```bash +--skip-torch-cuda-test # Skip CUDA test (we're using ROCm/HIP) +--medvram # Optimized for 8-16GB VRAM +--opt-split-attention # Reduces VRAM usage during attention +--no-half-vae # Prevents VAE errors with full precision +``` + +### For Different VRAM Amounts + +**16GB VRAM (Recommended):** +```bash +--skip-torch-cuda-test --medvram --opt-split-attention --no-half-vae +``` + +**8GB VRAM:** +```bash +--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae +``` + +**6GB VRAM or less:** +```bash +--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae --opt-channelslast +``` + +--- + +## Recommended Generation Settings + +### For 16GB VRAM + +**Safe Mode (Fast, No Errors):** +- Resolution: 512x512 +- Hires fix: OFF +- Batch size: 1 +- VRAM usage: ~4-6GB + +**Quality Mode (Best Results):** +- Resolution: 512x512 +- Hires fix: ON (1.5x upscale) +- Hires steps: 10 +- VRAM usage: ~8-12GB + +**With ControlNet:** +- Resolution: 512x512 +- Hires fix: OFF +- ControlNet units: 1-2 maximum +- Low VRAM mode: ON +- VRAM usage: ~6-10GB + +See the [VRAM Optimization Guide](ROCM_VRAM_OPTIMIZATION.md) for detailed workflows and settings. + +--- + +## Verification + +### Check PyTorch ROCm Installation + +```bash +source venv/bin/activate +python -c "import torch; print('ROCm available:', torch.cuda.is_available()); print('ROCm version:', torch.version.hip)" +``` + +**Expected output:** +``` +ROCm available: True +ROCm version: 6.2.x +``` + +### Monitor VRAM Usage + +```bash +watch -n 1 rocm-smi +``` + +Or check current usage: +```bash +rocm-smi --showmeminfo vram +``` + +--- + +## Troubleshooting + +### Out of Memory Errors + +If you encounter OOM errors: + +1. **Reduce resolution:** 768x768 → 512x512 +2. **Disable Hires fix** or reduce upscale ratio +3. **Use more aggressive flags:** + ```bash + export COMMANDLINE_ARGS="--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae" + ``` + +### Black Images or Artifacts + +Ensure `--no-half-vae` is in your command-line arguments. + +### Slow Generation + +- Use `--medvram` instead of `--lowvram` for 16GB VRAM +- Reduce sampling steps to 20 +- Try faster samplers: DPM++ 2M, Euler a + +### Model Loading Errors + +Verify PyTorch installation: +```bash +source venv/bin/activate +python -c "import torch; print(torch.cuda.is_available())" +``` + +If it returns `False`, reinstall PyTorch: +```bash +pip uninstall torch torchvision torchaudio +pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2 +``` + +For more troubleshooting, see the [VRAM Optimization Guide](ROCM_VRAM_OPTIMIZATION.md#troubleshooting). + +--- + +## Performance Tips + +1. **Two-Phase Workflow:** + - Generate at 512x512 without Hires fix (fast) + - Upscale separately using img2img or Extras tab (best quality) + +2. **ControlNet Best Practices:** + - Use only 1-2 units at a time + - Enable Low VRAM mode + - Disable Hires fix when using ControlNet + +3. **Batch Processing:** + - Use `Batch count` instead of `Batch size` + - Keep resolution at 512x512 for batches + +4. **Memory Management:** + - Restart WebUI after 50-100 generations + - Use "Unload SD checkpoint" when switching models + +--- + +## Additional Resources + +- **[VRAM Optimization Guide](ROCM_VRAM_OPTIMIZATION.md)** - Comprehensive optimization guide +- **[ROCm Documentation](https://rocm.docs.amd.com/)** - Official AMD ROCm docs +- **[SD WebUI Wiki](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki)** - Official WebUI documentation +- **[AMD GPU Support](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs)** - WebUI AMD GPU guide + +--- + +## Summary + +✅ **Key Points:** +- Use ROCm 6.2 for best compatibility +- Enable `expandable_segments:True` to prevent memory fragmentation +- Use `--medvram` for 16GB VRAM +- Start with 512x512, upscale separately for quality +- Enable ControlNet Low VRAM mode + +❌ **Avoid:** +- Batch size > 1 (use Batch count instead) +- Hires fix with 2x upscale on 16GB VRAM +- More than 2 ControlNet units simultaneously +- Direct generation at resolutions > 768x768 + +--- + +**For detailed configuration and workflows, see [ROCM_VRAM_OPTIMIZATION.md](ROCM_VRAM_OPTIMIZATION.md)** diff --git a/ROCM_VRAM_OPTIMIZATION.md b/ROCM_VRAM_OPTIMIZATION.md new file mode 100644 index 000000000..d7b636bc5 --- /dev/null +++ b/ROCM_VRAM_OPTIMIZATION.md @@ -0,0 +1,539 @@ +# ROCm 6.2 VRAM Optimization Guide for AMD GPUs + +This guide provides comprehensive instructions for optimizing Stable Diffusion WebUI on AMD GPUs with ROCm 6.2, specifically targeting systems with 8-16GB VRAM. + +## Table of Contents + +1. [Quick Start](#quick-start) +2. [Launch Configuration](#launch-configuration) +3. [WebUI Settings Optimization](#webui-settings-optimization) +4. [Generation Settings](#generation-settings) +5. [ControlNet Optimization](#controlnet-optimization) +6. [Recommended Workflows](#recommended-workflows) +7. [Troubleshooting](#troubleshooting) + +--- + +## Quick Start + +### 1. Setup Launch Script + +Copy the optimized ROCm 6.2 configuration: + +```bash +cp webui-user-rocm62.sh webui-user.sh +``` + +### 2. Launch WebUI + +```bash +./webui.sh +``` + +### 3. Configure WebUI Settings + +Navigate to **Settings → Optimizations** and apply the recommended settings (see below). + +--- + +## Launch Configuration + +### Environment Variables + +The following environment variables are set in `webui-user-rocm62.sh`: + +#### PyTorch with ROCm 6.2 + +```bash +export TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2" +``` + +**Purpose:** Installs PyTorch compiled with ROCm 6.2 support for AMD GPUs. + +#### HIP Memory Allocation + +```bash +export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True +``` + +**Purpose:** Prevents memory fragmentation, which is critical for stable VRAM usage and avoiding OOM (Out of Memory) errors. + +### Command Line Arguments + +```bash +export COMMANDLINE_ARGS="--skip-torch-cuda-test --medvram --opt-split-attention --no-half-vae" +``` + +#### Flag Explanations + +| Flag | Purpose | VRAM Impact | +|------|---------|-------------| +| `--skip-torch-cuda-test` | Skip CUDA test (using ROCm/HIP instead) | N/A | +| `--medvram` | Optimized for 8-16GB VRAM, moves models between GPU/CPU as needed | **High** - Critical for 16GB | +| `--opt-split-attention` | Reduces VRAM usage during attention computation | **Medium** - Saves 1-2GB | +| `--no-half-vae` | Uses full precision for VAE to prevent errors | **Low** - Prevents artifacts | + +#### Alternative Configurations + +**For GPUs with less than 8GB VRAM:** + +```bash +export COMMANDLINE_ARGS="--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae" +``` + +**For maximum compatibility (slower but most stable):** + +```bash +export COMMANDLINE_ARGS="--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae --opt-channelslast" +``` + +**With xformers (if installed):** + +```bash +export COMMANDLINE_ARGS="--skip-torch-cuda-test --medvram --xformers --no-half-vae" +``` + +--- + +## WebUI Settings Optimization + +Navigate to **Settings → Optimizations** in the WebUI and configure: + +### Recommended Settings + +| Setting | Value | Notes | +|---------|-------|-------| +| **Cross attention optimization** | `Doggettx` or `xformers` | Doggettx is default and works well | +| **Enable quantization in K samplers** | ✓ Enabled | Reduces VRAM usage | +| **Token merging ratio** | `0.5` | Merges similar tokens to save memory | +| **Pad prompt/negative prompt** | ✓ Enabled | Recommended for consistency | + +### Optional Advanced Settings + +| Setting | Value | Effect | +|---------|-------|--------| +| **Token merging ratio for hires** | `0.5` | Saves VRAM during hires fix | +| **Always discard next-to-last sigma** | ✓ Enabled | Minor VRAM savings | + +--- + +## Generation Settings + +### Safe Mode (No Errors, 16GB VRAM) + +**Best for:** Testing prompts, finding good seeds, general use + +``` +Width: 512 +Height: 512 +Batch count: 1 +Batch size: 1 +Hires fix: ☐ Disabled +Sampling steps: 20-30 +``` + +**VRAM Usage:** ~4-6GB + +--- + +### Quality Mode (With Hires Fix, 16GB VRAM) + +**Best for:** Final high-quality outputs + +``` +Width: 512 +Height: 512 +Batch count: 1 +Batch size: 1 +Hires fix: ✓ Enabled + Upscale by: 1.5 (avoid 2.0 on 16GB) + Hires steps: 10 + Denoising strength: 0.4 + Upscaler: Latent or R-ESRGAN 4x+ +Sampling steps: 20-30 +``` + +**VRAM Usage:** ~8-12GB + +**⚠️ Warning:** Using `Upscale by: 2.0` may cause OOM errors on 16GB VRAM. + +--- + +### Portrait Mode (512x768) + +**Best for:** Character portraits + +``` +Width: 512 +Height: 768 +Batch count: 1 +Batch size: 1 +Hires fix: ☐ Disabled (enable only after finding good seed) +Sampling steps: 20-30 +``` + +**VRAM Usage:** ~6-8GB + +--- + +### Landscape Mode (768x512) + +**Best for:** Scenery, backgrounds + +``` +Width: 768 +Height: 512 +Batch count: 1 +Batch size: 1 +Hires fix: ☐ Disabled (enable only after finding good seed) +Sampling steps: 20-30 +``` + +**VRAM Usage:** ~6-8GB + +--- + +## ControlNet Optimization + +When using ControlNet extensions, additional VRAM optimizations are necessary. + +### ControlNet Settings + +Navigate to **Settings → ControlNet** and configure: + +| Setting | Value | Purpose | +|---------|-------|---------| +| **Low VRAM mode** | ✓ Enabled | Critical for 16GB VRAM | +| **Pixel Perfect** | ☐ Disabled | Disable during testing to save VRAM | +| **Control Mode** | `Balanced` | Default, good balance | + +### Recommended Usage + +``` +Width: 512 +Height: 512 +Hires fix: ☐ Disabled +Active ControlNet units: 1-2 maximum +Batch size: 1 +``` + +**⚠️ Warning:** Using 3+ ControlNet units simultaneously may cause OOM errors. + +### ControlNet with Hires Fix + +**Not recommended for 16GB VRAM.** If necessary: + +``` +Width: 512 +Height: 512 +Hires fix: ✓ Enabled + Upscale by: 1.25 (minimum upscale) + Hires steps: 5 (minimum steps) +Active ControlNet units: 1 maximum +Low VRAM mode: ✓ Enabled +``` + +--- + +## Recommended Workflows + +### Workflow 1: Prompt Development (Fast) + +**Goal:** Find the perfect prompt and seed quickly + +1. **Settings:** + - Size: 512x512 + - Hires fix: OFF + - Steps: 20 + - Batch count: 4-8 (generate multiple images) + +2. **Process:** + - Experiment with different prompts + - Test various seeds + - Adjust CFG scale and sampling method + +3. **VRAM Usage:** ~4-6GB per image + +--- + +### Workflow 2: High-Quality Output (Two-Phase) + +**Goal:** Maximum quality without VRAM errors + +#### Phase 1: Generation + +``` +Size: 512x512 +Hires fix: OFF +Steps: 30-40 +Sampler: DPM++ 2M Karras or Euler a +CFG Scale: 7-8 +``` + +**Find your perfect image** with the right prompt, seed, and composition. + +#### Phase 2: Upscaling + +**Option A: Using img2img** + +1. Send image to img2img +2. Settings: + - Resize to: 1024x1024 or 768x1152 + - Denoising: 0.3-0.5 + - Steps: 20-30 + - Sampler: Same as generation + +**Option B: Using Extras Tab** + +1. Send to Extras +2. Upscaler: R-ESRGAN 4x+ or 4x-UltraSharp +3. Scale: 2x or 4x +4. Optional: GFPGAN or CodeFormer for face restoration + +**VRAM Usage:** Phase 1: ~4-6GB, Phase 2: ~6-10GB (depends on final resolution) + +--- + +### Workflow 3: ControlNet Generation + +**Goal:** Use ControlNet without VRAM errors + +1. **Initial Setup:** + - Size: 512x512 + - Hires fix: OFF + - ControlNet units: 1-2 maximum + - Low VRAM: ON + +2. **Generate base image:** + - Steps: 20-30 + - Find good composition + +3. **Upscale separately:** + - Use img2img without ControlNet + - Or use Extras tab + +**VRAM Usage:** ~6-10GB (depends on ControlNet type) + +--- + +### Workflow 4: Batch Processing + +**Goal:** Generate multiple images efficiently + +**Small batches (recommended):** + +``` +Size: 512x512 +Batch count: 4 +Batch size: 1 +Hires fix: OFF +``` + +**VRAM Usage:** ~4-6GB per image (sequential) + +**⚠️ Avoid:** +- `Batch size > 1` (generates simultaneously, uses much more VRAM) +- Hires fix with batch processing + +--- + +## Troubleshooting + +### Issue: Out of Memory (OOM) Errors + +**Symptoms:** +``` +RuntimeError: HIP out of memory +``` + +**Solutions:** + +1. **Reduce image resolution:** + - 768x768 → 512x512 + - 512x768 → 512x512 + +2. **Disable Hires fix or reduce upscale:** + - Turn OFF Hires fix + - Or change `Upscale by: 2.0` → `1.5` or `1.25` + +3. **Use more aggressive VRAM flags:** + ```bash + export COMMANDLINE_ARGS="--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae" + ``` + +4. **Reduce ControlNet units:** + - Use only 1 ControlNet unit + - Ensure Low VRAM mode is enabled + +5. **Close other applications:** + - Close browsers, games, or other GPU-intensive apps + - Check `rocm-smi` to see VRAM usage + +--- + +### Issue: Slow Generation Speed + +**Symptoms:** +- Images take very long to generate +- System feels sluggish + +**Solutions:** + +1. **Check if you're using the right flags:** + - Use `--medvram` not `--lowvram` for 16GB VRAM + - `--lowvram` is slower but uses less VRAM + +2. **Reduce sampling steps:** + - Try 20 steps instead of 40-50 + - Use faster samplers: DPM++ 2M, Euler a + +3. **Disable Token Merging:** + - Settings → Optimizations → Token merging ratio: 0 + - Token merging saves VRAM but may slow down generation + +4. **Check PyTorch installation:** + ```bash + python -c "import torch; print(torch.version.hip)" + ``` + Should output ROCm version (e.g., `6.2.x`) + +--- + +### Issue: Black Images or Artifacts + +**Symptoms:** +- Generated images are black +- Strange artifacts or noise + +**Solutions:** + +1. **Enable `--no-half-vae`:** + ```bash + export COMMANDLINE_ARGS="--skip-torch-cuda-test --medvram --opt-split-attention --no-half-vae" + ``` + +2. **Try different VAE:** + - Settings → Stable Diffusion → SD VAE + - Select `None` or try a different VAE + +3. **Check cross attention optimization:** + - Settings → Optimizations → Cross attention optimization + - Try `Doggettx`, `sub-quadratic`, or `none` + +--- + +### Issue: Model Loading Errors + +**Symptoms:** +``` +Error loading model +Couldn't load model +``` + +**Solutions:** + +1. **Verify PyTorch ROCm installation:** + ```bash + source venv/bin/activate + python -c "import torch; print(torch.cuda.is_available()); print(torch.version.hip)" + ``` + Should output: `True` and ROCm version + +2. **Reinstall PyTorch with ROCm 6.2:** + ```bash + source venv/bin/activate + pip uninstall torch torchvision torchaudio + pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2 + ``` + +3. **Check model file integrity:** + - Re-download the model + - Verify SHA256 hash if available + +--- + +### Issue: Memory Fragmentation + +**Symptoms:** +- VRAM usage increases over time +- OOM errors after multiple generations + +**Solutions:** + +1. **Ensure expandable segments is enabled:** + ```bash + export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True + ``` + +2. **Restart the WebUI periodically:** + - After 50-100 generations, restart the WebUI + +3. **Use the "Unload SD checkpoint" button:** + - Settings → Actions → Unload SD checkpoint to free VRAM + - Useful when switching between models + +--- + +### Checking VRAM Usage + +**Monitor VRAM in real-time:** + +```bash +watch -n 1 rocm-smi +``` + +**Check current VRAM usage:** + +```bash +rocm-smi --showmeminfo vram +``` + +--- + +## Performance Benchmarks + +Approximate generation times on AMD RX 6800/6900 XT (16GB VRAM): + +| Configuration | Resolution | Hires Fix | Steps | Time | +|---------------|------------|-----------|-------|------| +| Safe Mode | 512x512 | No | 20 | ~8-12s | +| Safe Mode | 512x512 | No | 30 | ~12-18s | +| Quality Mode | 512x512 → 768x768 | Yes (1.5x) | 20+10 | ~20-30s | +| Quality Mode | 512x512 → 1024x1024 | Yes (2x) | 20+10 | ~35-50s | +| Portrait | 512x768 | No | 20 | ~12-16s | +| ControlNet | 512x512 | No | 20 | ~15-25s | + +*Times may vary based on model, sampler, and prompt complexity.* + +--- + +## Additional Resources + +- **ROCm Documentation:** https://rocm.docs.amd.com/ +- **Stable Diffusion WebUI Wiki:** https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki +- **AMD GPU Support:** https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs + +--- + +## Summary of Key Points + +✅ **DO:** +- Use `--medvram` for 16GB VRAM +- Enable `expandable_segments:True` to prevent fragmentation +- Start with 512x512 resolution +- Use Hires fix with `1.5x` upscale maximum +- Enable ControlNet Low VRAM mode +- Generate at low resolution, upscale separately for best quality + +❌ **DON'T:** +- Use `Batch size > 1` (use `Batch count` instead) +- Use `Upscale by: 2.0` with Hires fix on 16GB VRAM +- Enable 3+ ControlNet units simultaneously +- Generate at 1024x1024 or higher directly +- Forget to set `--no-half-vae` (prevents VAE errors) + +--- + +**Last Updated:** 2025-11-15 +**ROCm Version:** 6.2 +**Target VRAM:** 8-16GB diff --git a/webui-user-rocm62.sh b/webui-user-rocm62.sh new file mode 100644 index 000000000..6a23a8b34 --- /dev/null +++ b/webui-user-rocm62.sh @@ -0,0 +1,127 @@ +#!/bin/bash +########################################################################################## +# ROCm 6.2 Optimized Launch Script for AMD GPUs with 16GB VRAM +# Based on best practices for VRAM optimization and memory management +########################################################################################## + +# Install directory without trailing slash +#install_dir="/home/$(whoami)" + +# Name of the subdirectory +#clone_dir="stable-diffusion-webui" + +# ============================================================================ +# ROCm 6.2 PyTorch Installation +# ============================================================================ +# Install PyTorch with ROCm 6.2 support +export TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2" + +# ============================================================================ +# PyTorch HIP Memory Allocation Configuration +# ============================================================================ +# Prevents memory fragmentation - CRITICAL for stable VRAM usage +export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True + +# ============================================================================ +# Command Line Arguments for VRAM Optimization (16GB VRAM) +# ============================================================================ +# Explanation of flags: +# --skip-torch-cuda-test : Skip CUDA test (we're using ROCm/HIP) +# --medvram : Optimized for 8-16GB VRAM, moves models between GPU/CPU as needed +# --opt-split-attention : Reduces VRAM usage during attention computation +# --no-half-vae : Prevents VAE errors by using full precision for VAE +# +# Additional optional flags for extreme VRAM savings (uncomment if needed): +# --lowvram : For GPUs with <8GB VRAM (use instead of --medvram) +# --xformers : Use xformers for additional memory optimization (requires installation) +# --opt-sdp-attention : Alternative attention optimization +export COMMANDLINE_ARGS="--skip-torch-cuda-test --medvram --opt-split-attention --no-half-vae" + +# ============================================================================ +# Optional: Additional VRAM Optimization Flags +# ============================================================================ +# Uncomment the line below for more aggressive VRAM savings: +# export COMMANDLINE_ARGS="--skip-torch-cuda-test --medvram --opt-split-attention --no-half-vae --opt-channelslast" + +# Uncomment for extreme low VRAM mode (<8GB): +# export COMMANDLINE_ARGS="--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae" + +# ============================================================================ +# Python and Git Configuration +# ============================================================================ +# python3 executable +#python_cmd="python3" + +# git executable +#export GIT="git" + +# python3 venv without trailing slash (defaults to ${install_dir}/${clone_dir}/venv) +#venv_dir="venv" + +# script to launch to start the app +#export LAUNCH_SCRIPT="launch.py" + +# ============================================================================ +# Package Configuration +# ============================================================================ +# Requirements file to use for stable-diffusion-webui +#export REQS_FILE="requirements_versions.txt" + +# Fixed git repos +#export K_DIFFUSION_PACKAGE="" +#export GFPGAN_PACKAGE="" + +# Fixed git commits +#export STABLE_DIFFUSION_COMMIT_HASH="" +#export CODEFORMER_COMMIT_HASH="" +#export BLIP_COMMIT_HASH="" + +# ============================================================================ +# Performance Tuning +# ============================================================================ +# Uncomment to enable accelerated launch +#export ACCELERATE="True" + +# Uncomment to disable TCMalloc (Thread-Caching Malloc) +# TCMalloc improves CPU memory allocation performance +#export NO_TCMALLOC="True" + +########################################################################################## +# Usage Instructions: +# +# 1. Copy this file to webui-user.sh: +# cp webui-user-rocm62.sh webui-user.sh +# +# 2. Launch the WebUI: +# ./webui.sh +# +# 3. In WebUI Settings → Optimizations, configure: +# - Enable quantization in K samplers: ✓ +# - Token merging ratio: 0.5 +# - Cross attention optimization: Doggettx (should be active) +# +# 4. Recommended Generation Settings for 16GB VRAM: +# +# Safe Mode (no errors): +# - Size: 512x512 +# - Hires fix: OFF +# - Batch size: 1 +# +# Quality Mode (with upscaling): +# - Size: 512x512 +# - Hires fix: ON +# - Upscale by: 1.5 (not 2.0) +# - Hires steps: 10 +# - Denoising: 0.4 +# +# With ControlNet: +# - Size: 512x512 +# - Hires fix: OFF +# - ControlNet units: max 1-2 active +# - Low VRAM mode: ON in ControlNet settings +# +# 5. Workflow for Best Quality: +# Phase 1 - Generation: 512x512, no hires fix → find perfect seed/prompt +# Phase 2 - Upscaling: Use img2img or "Send to Extras" → R-ESRGAN 4x+ +# +########################################################################################## diff --git a/webui.sh b/webui.sh index 89dae163a..b06d821f9 100755 --- a/webui.sh +++ b/webui.sh @@ -153,7 +153,7 @@ case "$gpu_info" in *"Navi 2"*) export HSA_OVERRIDE_GFX_VERSION=10.3.0 ;; *"Navi 3"*) [[ -z "${TORCH_COMMAND}" ]] && \ - export TORCH_COMMAND="pip install torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm5.7" + export TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2" ;; *"Renoir"*) export HSA_OVERRIDE_GFX_VERSION=9.0.0 printf "\n%s\n" "${delimiter}" @@ -167,7 +167,7 @@ if ! echo "$gpu_info" | grep -q "NVIDIA"; then if echo "$gpu_info" | grep -q "AMD" && [[ -z "${TORCH_COMMAND}" ]] then - export TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7" + export TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2" elif npu-smi info 2>/dev/null then export TORCH_COMMAND="pip install torch==2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu; pip install torch_npu==2.1.0"