mirror of
https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
synced 2026-01-31 04:42:09 -08:00
Add ROCm 6.2 VRAM optimization for AMD GPUs (8-16GB)
This commit adds comprehensive ROCm 6.2 support and VRAM optimization for AMD GPUs, specifically targeting systems with 8-16GB VRAM. Changes: - Updated webui.sh to use ROCm 6.2 instead of 5.7 for AMD GPUs - Added webui-user-rocm62.sh: Optimized launch script with: * PyTorch ROCm 6.2 installation command * PYTORCH_HIP_ALLOC_CONF for memory fragmentation prevention * Optimized command-line flags (--medvram, --opt-split-attention, etc.) * Detailed inline documentation - Added ROCM_VRAM_OPTIMIZATION.md: Comprehensive 400+ line guide covering: * Launch configuration and environment variables * WebUI settings optimization * Generation settings for different VRAM amounts * ControlNet optimization techniques * Recommended workflows for quality and performance * Extensive troubleshooting section * Performance benchmarks - Added README_ROCM.md: Quick start guide for ROCm setup Key optimizations: - Memory fragmentation prevention via expandable_segments - Optimal command-line arguments for 16GB VRAM - Two-phase workflow (generate at 512x512, upscale separately) - ControlNet low VRAM mode configuration - Batch processing best practices Benefits: - Prevents OOM errors on 16GB VRAM GPUs - Improved stability for long generation sessions - Better quality outputs through optimized workflows - Faster iteration with recommended settings
This commit is contained in:
parent
82a973c043
commit
5e65b0e6a9
4 changed files with 963 additions and 2 deletions
295
README_ROCM.md
Normal file
295
README_ROCM.md
Normal file
|
|
@ -0,0 +1,295 @@
|
|||
# ROCm Setup Guide for Stable Diffusion WebUI
|
||||
|
||||
This guide helps you set up and optimize Stable Diffusion WebUI for AMD GPUs using ROCm 6.2.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Copy the Optimized Launch Configuration
|
||||
|
||||
```bash
|
||||
cp webui-user-rocm62.sh webui-user.sh
|
||||
```
|
||||
|
||||
### 2. Launch the WebUI
|
||||
|
||||
```bash
|
||||
./webui.sh
|
||||
```
|
||||
|
||||
The launch script will automatically:
|
||||
- Install PyTorch with ROCm 6.2 support
|
||||
- Configure optimal VRAM settings for 16GB GPUs
|
||||
- Set up memory management to prevent fragmentation
|
||||
|
||||
### 3. Configure WebUI Settings
|
||||
|
||||
After the WebUI starts, navigate to **Settings → Optimizations** and configure:
|
||||
|
||||
- **Cross attention optimization:** `Doggettx` (default)
|
||||
- **Enable quantization in K samplers:** ✓ Enabled
|
||||
- **Token merging ratio:** `0.5`
|
||||
|
||||
See the [VRAM Optimization Guide](ROCM_VRAM_OPTIMIZATION.md) for detailed configuration instructions.
|
||||
|
||||
---
|
||||
|
||||
## System Requirements
|
||||
|
||||
### Supported AMD GPUs
|
||||
|
||||
- **RX 6000 Series** (Navi 2): RX 6700 XT, 6800, 6800 XT, 6900 XT
|
||||
- **RX 7000 Series** (Navi 3): RX 7600, 7700 XT, 7800 XT, 7900 XT, 7900 XTX
|
||||
- **RX 5000 Series** (Navi 1): RX 5700 XT (with limitations)
|
||||
|
||||
### Recommended VRAM
|
||||
|
||||
- **Minimum:** 8GB VRAM
|
||||
- **Recommended:** 16GB VRAM
|
||||
- **Optimal:** 24GB VRAM
|
||||
|
||||
### Software Requirements
|
||||
|
||||
- **ROCm:** 6.2 or newer
|
||||
- **Python:** 3.10 or 3.11
|
||||
- **Linux:** Ubuntu 22.04, Fedora 38+, or Arch Linux
|
||||
|
||||
---
|
||||
|
||||
## Installation
|
||||
|
||||
### Option 1: Automatic Setup (Recommended)
|
||||
|
||||
The `webui.sh` script automatically detects AMD GPUs and installs ROCm 6.2 support.
|
||||
|
||||
```bash
|
||||
# Clone the repository (if not already done)
|
||||
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
|
||||
cd stable-diffusion-webui
|
||||
|
||||
# Copy the optimized configuration
|
||||
cp webui-user-rocm62.sh webui-user.sh
|
||||
|
||||
# Launch (will install dependencies automatically)
|
||||
./webui.sh
|
||||
```
|
||||
|
||||
### Option 2: Manual Setup
|
||||
|
||||
If you need manual control over the installation:
|
||||
|
||||
```bash
|
||||
# Create and activate virtual environment
|
||||
python3 -m venv venv
|
||||
source venv/bin/activate
|
||||
|
||||
# Install PyTorch with ROCm 6.2
|
||||
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
|
||||
|
||||
# Install Stable Diffusion WebUI requirements
|
||||
pip install -r requirements_versions.txt
|
||||
|
||||
# Set environment variables
|
||||
export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True
|
||||
|
||||
# Launch with optimized flags
|
||||
python launch.py --skip-torch-cuda-test --medvram --opt-split-attention --no-half-vae
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Configuration Files
|
||||
|
||||
### `webui-user-rocm62.sh`
|
||||
|
||||
Pre-configured launch script with optimal settings for AMD GPUs with 16GB VRAM.
|
||||
|
||||
**Key settings:**
|
||||
- PyTorch with ROCm 6.2
|
||||
- Memory fragmentation prevention
|
||||
- VRAM-optimized command-line arguments
|
||||
|
||||
### `ROCM_VRAM_OPTIMIZATION.md`
|
||||
|
||||
Comprehensive guide covering:
|
||||
- WebUI settings optimization
|
||||
- Generation settings for different VRAM amounts
|
||||
- ControlNet optimization
|
||||
- Workflows for best quality
|
||||
- Troubleshooting common issues
|
||||
|
||||
---
|
||||
|
||||
## Command-Line Arguments Explained
|
||||
|
||||
The optimized configuration uses these flags:
|
||||
|
||||
```bash
|
||||
--skip-torch-cuda-test # Skip CUDA test (we're using ROCm/HIP)
|
||||
--medvram # Optimized for 8-16GB VRAM
|
||||
--opt-split-attention # Reduces VRAM usage during attention
|
||||
--no-half-vae # Prevents VAE errors with full precision
|
||||
```
|
||||
|
||||
### For Different VRAM Amounts
|
||||
|
||||
**16GB VRAM (Recommended):**
|
||||
```bash
|
||||
--skip-torch-cuda-test --medvram --opt-split-attention --no-half-vae
|
||||
```
|
||||
|
||||
**8GB VRAM:**
|
||||
```bash
|
||||
--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae
|
||||
```
|
||||
|
||||
**6GB VRAM or less:**
|
||||
```bash
|
||||
--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae --opt-channelslast
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recommended Generation Settings
|
||||
|
||||
### For 16GB VRAM
|
||||
|
||||
**Safe Mode (Fast, No Errors):**
|
||||
- Resolution: 512x512
|
||||
- Hires fix: OFF
|
||||
- Batch size: 1
|
||||
- VRAM usage: ~4-6GB
|
||||
|
||||
**Quality Mode (Best Results):**
|
||||
- Resolution: 512x512
|
||||
- Hires fix: ON (1.5x upscale)
|
||||
- Hires steps: 10
|
||||
- VRAM usage: ~8-12GB
|
||||
|
||||
**With ControlNet:**
|
||||
- Resolution: 512x512
|
||||
- Hires fix: OFF
|
||||
- ControlNet units: 1-2 maximum
|
||||
- Low VRAM mode: ON
|
||||
- VRAM usage: ~6-10GB
|
||||
|
||||
See the [VRAM Optimization Guide](ROCM_VRAM_OPTIMIZATION.md) for detailed workflows and settings.
|
||||
|
||||
---
|
||||
|
||||
## Verification
|
||||
|
||||
### Check PyTorch ROCm Installation
|
||||
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -c "import torch; print('ROCm available:', torch.cuda.is_available()); print('ROCm version:', torch.version.hip)"
|
||||
```
|
||||
|
||||
**Expected output:**
|
||||
```
|
||||
ROCm available: True
|
||||
ROCm version: 6.2.x
|
||||
```
|
||||
|
||||
### Monitor VRAM Usage
|
||||
|
||||
```bash
|
||||
watch -n 1 rocm-smi
|
||||
```
|
||||
|
||||
Or check current usage:
|
||||
```bash
|
||||
rocm-smi --showmeminfo vram
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Out of Memory Errors
|
||||
|
||||
If you encounter OOM errors:
|
||||
|
||||
1. **Reduce resolution:** 768x768 → 512x512
|
||||
2. **Disable Hires fix** or reduce upscale ratio
|
||||
3. **Use more aggressive flags:**
|
||||
```bash
|
||||
export COMMANDLINE_ARGS="--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae"
|
||||
```
|
||||
|
||||
### Black Images or Artifacts
|
||||
|
||||
Ensure `--no-half-vae` is in your command-line arguments.
|
||||
|
||||
### Slow Generation
|
||||
|
||||
- Use `--medvram` instead of `--lowvram` for 16GB VRAM
|
||||
- Reduce sampling steps to 20
|
||||
- Try faster samplers: DPM++ 2M, Euler a
|
||||
|
||||
### Model Loading Errors
|
||||
|
||||
Verify PyTorch installation:
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -c "import torch; print(torch.cuda.is_available())"
|
||||
```
|
||||
|
||||
If it returns `False`, reinstall PyTorch:
|
||||
```bash
|
||||
pip uninstall torch torchvision torchaudio
|
||||
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
|
||||
```
|
||||
|
||||
For more troubleshooting, see the [VRAM Optimization Guide](ROCM_VRAM_OPTIMIZATION.md#troubleshooting).
|
||||
|
||||
---
|
||||
|
||||
## Performance Tips
|
||||
|
||||
1. **Two-Phase Workflow:**
|
||||
- Generate at 512x512 without Hires fix (fast)
|
||||
- Upscale separately using img2img or Extras tab (best quality)
|
||||
|
||||
2. **ControlNet Best Practices:**
|
||||
- Use only 1-2 units at a time
|
||||
- Enable Low VRAM mode
|
||||
- Disable Hires fix when using ControlNet
|
||||
|
||||
3. **Batch Processing:**
|
||||
- Use `Batch count` instead of `Batch size`
|
||||
- Keep resolution at 512x512 for batches
|
||||
|
||||
4. **Memory Management:**
|
||||
- Restart WebUI after 50-100 generations
|
||||
- Use "Unload SD checkpoint" when switching models
|
||||
|
||||
---
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **[VRAM Optimization Guide](ROCM_VRAM_OPTIMIZATION.md)** - Comprehensive optimization guide
|
||||
- **[ROCm Documentation](https://rocm.docs.amd.com/)** - Official AMD ROCm docs
|
||||
- **[SD WebUI Wiki](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki)** - Official WebUI documentation
|
||||
- **[AMD GPU Support](https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs)** - WebUI AMD GPU guide
|
||||
|
||||
---
|
||||
|
||||
## Summary
|
||||
|
||||
✅ **Key Points:**
|
||||
- Use ROCm 6.2 for best compatibility
|
||||
- Enable `expandable_segments:True` to prevent memory fragmentation
|
||||
- Use `--medvram` for 16GB VRAM
|
||||
- Start with 512x512, upscale separately for quality
|
||||
- Enable ControlNet Low VRAM mode
|
||||
|
||||
❌ **Avoid:**
|
||||
- Batch size > 1 (use Batch count instead)
|
||||
- Hires fix with 2x upscale on 16GB VRAM
|
||||
- More than 2 ControlNet units simultaneously
|
||||
- Direct generation at resolutions > 768x768
|
||||
|
||||
---
|
||||
|
||||
**For detailed configuration and workflows, see [ROCM_VRAM_OPTIMIZATION.md](ROCM_VRAM_OPTIMIZATION.md)**
|
||||
539
ROCM_VRAM_OPTIMIZATION.md
Normal file
539
ROCM_VRAM_OPTIMIZATION.md
Normal file
|
|
@ -0,0 +1,539 @@
|
|||
# ROCm 6.2 VRAM Optimization Guide for AMD GPUs
|
||||
|
||||
This guide provides comprehensive instructions for optimizing Stable Diffusion WebUI on AMD GPUs with ROCm 6.2, specifically targeting systems with 8-16GB VRAM.
|
||||
|
||||
## Table of Contents
|
||||
|
||||
1. [Quick Start](#quick-start)
|
||||
2. [Launch Configuration](#launch-configuration)
|
||||
3. [WebUI Settings Optimization](#webui-settings-optimization)
|
||||
4. [Generation Settings](#generation-settings)
|
||||
5. [ControlNet Optimization](#controlnet-optimization)
|
||||
6. [Recommended Workflows](#recommended-workflows)
|
||||
7. [Troubleshooting](#troubleshooting)
|
||||
|
||||
---
|
||||
|
||||
## Quick Start
|
||||
|
||||
### 1. Setup Launch Script
|
||||
|
||||
Copy the optimized ROCm 6.2 configuration:
|
||||
|
||||
```bash
|
||||
cp webui-user-rocm62.sh webui-user.sh
|
||||
```
|
||||
|
||||
### 2. Launch WebUI
|
||||
|
||||
```bash
|
||||
./webui.sh
|
||||
```
|
||||
|
||||
### 3. Configure WebUI Settings
|
||||
|
||||
Navigate to **Settings → Optimizations** and apply the recommended settings (see below).
|
||||
|
||||
---
|
||||
|
||||
## Launch Configuration
|
||||
|
||||
### Environment Variables
|
||||
|
||||
The following environment variables are set in `webui-user-rocm62.sh`:
|
||||
|
||||
#### PyTorch with ROCm 6.2
|
||||
|
||||
```bash
|
||||
export TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2"
|
||||
```
|
||||
|
||||
**Purpose:** Installs PyTorch compiled with ROCm 6.2 support for AMD GPUs.
|
||||
|
||||
#### HIP Memory Allocation
|
||||
|
||||
```bash
|
||||
export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True
|
||||
```
|
||||
|
||||
**Purpose:** Prevents memory fragmentation, which is critical for stable VRAM usage and avoiding OOM (Out of Memory) errors.
|
||||
|
||||
### Command Line Arguments
|
||||
|
||||
```bash
|
||||
export COMMANDLINE_ARGS="--skip-torch-cuda-test --medvram --opt-split-attention --no-half-vae"
|
||||
```
|
||||
|
||||
#### Flag Explanations
|
||||
|
||||
| Flag | Purpose | VRAM Impact |
|
||||
|------|---------|-------------|
|
||||
| `--skip-torch-cuda-test` | Skip CUDA test (using ROCm/HIP instead) | N/A |
|
||||
| `--medvram` | Optimized for 8-16GB VRAM, moves models between GPU/CPU as needed | **High** - Critical for 16GB |
|
||||
| `--opt-split-attention` | Reduces VRAM usage during attention computation | **Medium** - Saves 1-2GB |
|
||||
| `--no-half-vae` | Uses full precision for VAE to prevent errors | **Low** - Prevents artifacts |
|
||||
|
||||
#### Alternative Configurations
|
||||
|
||||
**For GPUs with less than 8GB VRAM:**
|
||||
|
||||
```bash
|
||||
export COMMANDLINE_ARGS="--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae"
|
||||
```
|
||||
|
||||
**For maximum compatibility (slower but most stable):**
|
||||
|
||||
```bash
|
||||
export COMMANDLINE_ARGS="--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae --opt-channelslast"
|
||||
```
|
||||
|
||||
**With xformers (if installed):**
|
||||
|
||||
```bash
|
||||
export COMMANDLINE_ARGS="--skip-torch-cuda-test --medvram --xformers --no-half-vae"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## WebUI Settings Optimization
|
||||
|
||||
Navigate to **Settings → Optimizations** in the WebUI and configure:
|
||||
|
||||
### Recommended Settings
|
||||
|
||||
| Setting | Value | Notes |
|
||||
|---------|-------|-------|
|
||||
| **Cross attention optimization** | `Doggettx` or `xformers` | Doggettx is default and works well |
|
||||
| **Enable quantization in K samplers** | ✓ Enabled | Reduces VRAM usage |
|
||||
| **Token merging ratio** | `0.5` | Merges similar tokens to save memory |
|
||||
| **Pad prompt/negative prompt** | ✓ Enabled | Recommended for consistency |
|
||||
|
||||
### Optional Advanced Settings
|
||||
|
||||
| Setting | Value | Effect |
|
||||
|---------|-------|--------|
|
||||
| **Token merging ratio for hires** | `0.5` | Saves VRAM during hires fix |
|
||||
| **Always discard next-to-last sigma** | ✓ Enabled | Minor VRAM savings |
|
||||
|
||||
---
|
||||
|
||||
## Generation Settings
|
||||
|
||||
### Safe Mode (No Errors, 16GB VRAM)
|
||||
|
||||
**Best for:** Testing prompts, finding good seeds, general use
|
||||
|
||||
```
|
||||
Width: 512
|
||||
Height: 512
|
||||
Batch count: 1
|
||||
Batch size: 1
|
||||
Hires fix: ☐ Disabled
|
||||
Sampling steps: 20-30
|
||||
```
|
||||
|
||||
**VRAM Usage:** ~4-6GB
|
||||
|
||||
---
|
||||
|
||||
### Quality Mode (With Hires Fix, 16GB VRAM)
|
||||
|
||||
**Best for:** Final high-quality outputs
|
||||
|
||||
```
|
||||
Width: 512
|
||||
Height: 512
|
||||
Batch count: 1
|
||||
Batch size: 1
|
||||
Hires fix: ✓ Enabled
|
||||
Upscale by: 1.5 (avoid 2.0 on 16GB)
|
||||
Hires steps: 10
|
||||
Denoising strength: 0.4
|
||||
Upscaler: Latent or R-ESRGAN 4x+
|
||||
Sampling steps: 20-30
|
||||
```
|
||||
|
||||
**VRAM Usage:** ~8-12GB
|
||||
|
||||
**⚠️ Warning:** Using `Upscale by: 2.0` may cause OOM errors on 16GB VRAM.
|
||||
|
||||
---
|
||||
|
||||
### Portrait Mode (512x768)
|
||||
|
||||
**Best for:** Character portraits
|
||||
|
||||
```
|
||||
Width: 512
|
||||
Height: 768
|
||||
Batch count: 1
|
||||
Batch size: 1
|
||||
Hires fix: ☐ Disabled (enable only after finding good seed)
|
||||
Sampling steps: 20-30
|
||||
```
|
||||
|
||||
**VRAM Usage:** ~6-8GB
|
||||
|
||||
---
|
||||
|
||||
### Landscape Mode (768x512)
|
||||
|
||||
**Best for:** Scenery, backgrounds
|
||||
|
||||
```
|
||||
Width: 768
|
||||
Height: 512
|
||||
Batch count: 1
|
||||
Batch size: 1
|
||||
Hires fix: ☐ Disabled (enable only after finding good seed)
|
||||
Sampling steps: 20-30
|
||||
```
|
||||
|
||||
**VRAM Usage:** ~6-8GB
|
||||
|
||||
---
|
||||
|
||||
## ControlNet Optimization
|
||||
|
||||
When using ControlNet extensions, additional VRAM optimizations are necessary.
|
||||
|
||||
### ControlNet Settings
|
||||
|
||||
Navigate to **Settings → ControlNet** and configure:
|
||||
|
||||
| Setting | Value | Purpose |
|
||||
|---------|-------|---------|
|
||||
| **Low VRAM mode** | ✓ Enabled | Critical for 16GB VRAM |
|
||||
| **Pixel Perfect** | ☐ Disabled | Disable during testing to save VRAM |
|
||||
| **Control Mode** | `Balanced` | Default, good balance |
|
||||
|
||||
### Recommended Usage
|
||||
|
||||
```
|
||||
Width: 512
|
||||
Height: 512
|
||||
Hires fix: ☐ Disabled
|
||||
Active ControlNet units: 1-2 maximum
|
||||
Batch size: 1
|
||||
```
|
||||
|
||||
**⚠️ Warning:** Using 3+ ControlNet units simultaneously may cause OOM errors.
|
||||
|
||||
### ControlNet with Hires Fix
|
||||
|
||||
**Not recommended for 16GB VRAM.** If necessary:
|
||||
|
||||
```
|
||||
Width: 512
|
||||
Height: 512
|
||||
Hires fix: ✓ Enabled
|
||||
Upscale by: 1.25 (minimum upscale)
|
||||
Hires steps: 5 (minimum steps)
|
||||
Active ControlNet units: 1 maximum
|
||||
Low VRAM mode: ✓ Enabled
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Recommended Workflows
|
||||
|
||||
### Workflow 1: Prompt Development (Fast)
|
||||
|
||||
**Goal:** Find the perfect prompt and seed quickly
|
||||
|
||||
1. **Settings:**
|
||||
- Size: 512x512
|
||||
- Hires fix: OFF
|
||||
- Steps: 20
|
||||
- Batch count: 4-8 (generate multiple images)
|
||||
|
||||
2. **Process:**
|
||||
- Experiment with different prompts
|
||||
- Test various seeds
|
||||
- Adjust CFG scale and sampling method
|
||||
|
||||
3. **VRAM Usage:** ~4-6GB per image
|
||||
|
||||
---
|
||||
|
||||
### Workflow 2: High-Quality Output (Two-Phase)
|
||||
|
||||
**Goal:** Maximum quality without VRAM errors
|
||||
|
||||
#### Phase 1: Generation
|
||||
|
||||
```
|
||||
Size: 512x512
|
||||
Hires fix: OFF
|
||||
Steps: 30-40
|
||||
Sampler: DPM++ 2M Karras or Euler a
|
||||
CFG Scale: 7-8
|
||||
```
|
||||
|
||||
**Find your perfect image** with the right prompt, seed, and composition.
|
||||
|
||||
#### Phase 2: Upscaling
|
||||
|
||||
**Option A: Using img2img**
|
||||
|
||||
1. Send image to img2img
|
||||
2. Settings:
|
||||
- Resize to: 1024x1024 or 768x1152
|
||||
- Denoising: 0.3-0.5
|
||||
- Steps: 20-30
|
||||
- Sampler: Same as generation
|
||||
|
||||
**Option B: Using Extras Tab**
|
||||
|
||||
1. Send to Extras
|
||||
2. Upscaler: R-ESRGAN 4x+ or 4x-UltraSharp
|
||||
3. Scale: 2x or 4x
|
||||
4. Optional: GFPGAN or CodeFormer for face restoration
|
||||
|
||||
**VRAM Usage:** Phase 1: ~4-6GB, Phase 2: ~6-10GB (depends on final resolution)
|
||||
|
||||
---
|
||||
|
||||
### Workflow 3: ControlNet Generation
|
||||
|
||||
**Goal:** Use ControlNet without VRAM errors
|
||||
|
||||
1. **Initial Setup:**
|
||||
- Size: 512x512
|
||||
- Hires fix: OFF
|
||||
- ControlNet units: 1-2 maximum
|
||||
- Low VRAM: ON
|
||||
|
||||
2. **Generate base image:**
|
||||
- Steps: 20-30
|
||||
- Find good composition
|
||||
|
||||
3. **Upscale separately:**
|
||||
- Use img2img without ControlNet
|
||||
- Or use Extras tab
|
||||
|
||||
**VRAM Usage:** ~6-10GB (depends on ControlNet type)
|
||||
|
||||
---
|
||||
|
||||
### Workflow 4: Batch Processing
|
||||
|
||||
**Goal:** Generate multiple images efficiently
|
||||
|
||||
**Small batches (recommended):**
|
||||
|
||||
```
|
||||
Size: 512x512
|
||||
Batch count: 4
|
||||
Batch size: 1
|
||||
Hires fix: OFF
|
||||
```
|
||||
|
||||
**VRAM Usage:** ~4-6GB per image (sequential)
|
||||
|
||||
**⚠️ Avoid:**
|
||||
- `Batch size > 1` (generates simultaneously, uses much more VRAM)
|
||||
- Hires fix with batch processing
|
||||
|
||||
---
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Issue: Out of Memory (OOM) Errors
|
||||
|
||||
**Symptoms:**
|
||||
```
|
||||
RuntimeError: HIP out of memory
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. **Reduce image resolution:**
|
||||
- 768x768 → 512x512
|
||||
- 512x768 → 512x512
|
||||
|
||||
2. **Disable Hires fix or reduce upscale:**
|
||||
- Turn OFF Hires fix
|
||||
- Or change `Upscale by: 2.0` → `1.5` or `1.25`
|
||||
|
||||
3. **Use more aggressive VRAM flags:**
|
||||
```bash
|
||||
export COMMANDLINE_ARGS="--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae"
|
||||
```
|
||||
|
||||
4. **Reduce ControlNet units:**
|
||||
- Use only 1 ControlNet unit
|
||||
- Ensure Low VRAM mode is enabled
|
||||
|
||||
5. **Close other applications:**
|
||||
- Close browsers, games, or other GPU-intensive apps
|
||||
- Check `rocm-smi` to see VRAM usage
|
||||
|
||||
---
|
||||
|
||||
### Issue: Slow Generation Speed
|
||||
|
||||
**Symptoms:**
|
||||
- Images take very long to generate
|
||||
- System feels sluggish
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. **Check if you're using the right flags:**
|
||||
- Use `--medvram` not `--lowvram` for 16GB VRAM
|
||||
- `--lowvram` is slower but uses less VRAM
|
||||
|
||||
2. **Reduce sampling steps:**
|
||||
- Try 20 steps instead of 40-50
|
||||
- Use faster samplers: DPM++ 2M, Euler a
|
||||
|
||||
3. **Disable Token Merging:**
|
||||
- Settings → Optimizations → Token merging ratio: 0
|
||||
- Token merging saves VRAM but may slow down generation
|
||||
|
||||
4. **Check PyTorch installation:**
|
||||
```bash
|
||||
python -c "import torch; print(torch.version.hip)"
|
||||
```
|
||||
Should output ROCm version (e.g., `6.2.x`)
|
||||
|
||||
---
|
||||
|
||||
### Issue: Black Images or Artifacts
|
||||
|
||||
**Symptoms:**
|
||||
- Generated images are black
|
||||
- Strange artifacts or noise
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. **Enable `--no-half-vae`:**
|
||||
```bash
|
||||
export COMMANDLINE_ARGS="--skip-torch-cuda-test --medvram --opt-split-attention --no-half-vae"
|
||||
```
|
||||
|
||||
2. **Try different VAE:**
|
||||
- Settings → Stable Diffusion → SD VAE
|
||||
- Select `None` or try a different VAE
|
||||
|
||||
3. **Check cross attention optimization:**
|
||||
- Settings → Optimizations → Cross attention optimization
|
||||
- Try `Doggettx`, `sub-quadratic`, or `none`
|
||||
|
||||
---
|
||||
|
||||
### Issue: Model Loading Errors
|
||||
|
||||
**Symptoms:**
|
||||
```
|
||||
Error loading model
|
||||
Couldn't load model
|
||||
```
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. **Verify PyTorch ROCm installation:**
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
python -c "import torch; print(torch.cuda.is_available()); print(torch.version.hip)"
|
||||
```
|
||||
Should output: `True` and ROCm version
|
||||
|
||||
2. **Reinstall PyTorch with ROCm 6.2:**
|
||||
```bash
|
||||
source venv/bin/activate
|
||||
pip uninstall torch torchvision torchaudio
|
||||
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2
|
||||
```
|
||||
|
||||
3. **Check model file integrity:**
|
||||
- Re-download the model
|
||||
- Verify SHA256 hash if available
|
||||
|
||||
---
|
||||
|
||||
### Issue: Memory Fragmentation
|
||||
|
||||
**Symptoms:**
|
||||
- VRAM usage increases over time
|
||||
- OOM errors after multiple generations
|
||||
|
||||
**Solutions:**
|
||||
|
||||
1. **Ensure expandable segments is enabled:**
|
||||
```bash
|
||||
export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True
|
||||
```
|
||||
|
||||
2. **Restart the WebUI periodically:**
|
||||
- After 50-100 generations, restart the WebUI
|
||||
|
||||
3. **Use the "Unload SD checkpoint" button:**
|
||||
- Settings → Actions → Unload SD checkpoint to free VRAM
|
||||
- Useful when switching between models
|
||||
|
||||
---
|
||||
|
||||
### Checking VRAM Usage
|
||||
|
||||
**Monitor VRAM in real-time:**
|
||||
|
||||
```bash
|
||||
watch -n 1 rocm-smi
|
||||
```
|
||||
|
||||
**Check current VRAM usage:**
|
||||
|
||||
```bash
|
||||
rocm-smi --showmeminfo vram
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Performance Benchmarks
|
||||
|
||||
Approximate generation times on AMD RX 6800/6900 XT (16GB VRAM):
|
||||
|
||||
| Configuration | Resolution | Hires Fix | Steps | Time |
|
||||
|---------------|------------|-----------|-------|------|
|
||||
| Safe Mode | 512x512 | No | 20 | ~8-12s |
|
||||
| Safe Mode | 512x512 | No | 30 | ~12-18s |
|
||||
| Quality Mode | 512x512 → 768x768 | Yes (1.5x) | 20+10 | ~20-30s |
|
||||
| Quality Mode | 512x512 → 1024x1024 | Yes (2x) | 20+10 | ~35-50s |
|
||||
| Portrait | 512x768 | No | 20 | ~12-16s |
|
||||
| ControlNet | 512x512 | No | 20 | ~15-25s |
|
||||
|
||||
*Times may vary based on model, sampler, and prompt complexity.*
|
||||
|
||||
---
|
||||
|
||||
## Additional Resources
|
||||
|
||||
- **ROCm Documentation:** https://rocm.docs.amd.com/
|
||||
- **Stable Diffusion WebUI Wiki:** https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki
|
||||
- **AMD GPU Support:** https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Install-and-Run-on-AMD-GPUs
|
||||
|
||||
---
|
||||
|
||||
## Summary of Key Points
|
||||
|
||||
✅ **DO:**
|
||||
- Use `--medvram` for 16GB VRAM
|
||||
- Enable `expandable_segments:True` to prevent fragmentation
|
||||
- Start with 512x512 resolution
|
||||
- Use Hires fix with `1.5x` upscale maximum
|
||||
- Enable ControlNet Low VRAM mode
|
||||
- Generate at low resolution, upscale separately for best quality
|
||||
|
||||
❌ **DON'T:**
|
||||
- Use `Batch size > 1` (use `Batch count` instead)
|
||||
- Use `Upscale by: 2.0` with Hires fix on 16GB VRAM
|
||||
- Enable 3+ ControlNet units simultaneously
|
||||
- Generate at 1024x1024 or higher directly
|
||||
- Forget to set `--no-half-vae` (prevents VAE errors)
|
||||
|
||||
---
|
||||
|
||||
**Last Updated:** 2025-11-15
|
||||
**ROCm Version:** 6.2
|
||||
**Target VRAM:** 8-16GB
|
||||
127
webui-user-rocm62.sh
Normal file
127
webui-user-rocm62.sh
Normal file
|
|
@ -0,0 +1,127 @@
|
|||
#!/bin/bash
|
||||
##########################################################################################
|
||||
# ROCm 6.2 Optimized Launch Script for AMD GPUs with 16GB VRAM
|
||||
# Based on best practices for VRAM optimization and memory management
|
||||
##########################################################################################
|
||||
|
||||
# Install directory without trailing slash
|
||||
#install_dir="/home/$(whoami)"
|
||||
|
||||
# Name of the subdirectory
|
||||
#clone_dir="stable-diffusion-webui"
|
||||
|
||||
# ============================================================================
|
||||
# ROCm 6.2 PyTorch Installation
|
||||
# ============================================================================
|
||||
# Install PyTorch with ROCm 6.2 support
|
||||
export TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2"
|
||||
|
||||
# ============================================================================
|
||||
# PyTorch HIP Memory Allocation Configuration
|
||||
# ============================================================================
|
||||
# Prevents memory fragmentation - CRITICAL for stable VRAM usage
|
||||
export PYTORCH_HIP_ALLOC_CONF=expandable_segments:True
|
||||
|
||||
# ============================================================================
|
||||
# Command Line Arguments for VRAM Optimization (16GB VRAM)
|
||||
# ============================================================================
|
||||
# Explanation of flags:
|
||||
# --skip-torch-cuda-test : Skip CUDA test (we're using ROCm/HIP)
|
||||
# --medvram : Optimized for 8-16GB VRAM, moves models between GPU/CPU as needed
|
||||
# --opt-split-attention : Reduces VRAM usage during attention computation
|
||||
# --no-half-vae : Prevents VAE errors by using full precision for VAE
|
||||
#
|
||||
# Additional optional flags for extreme VRAM savings (uncomment if needed):
|
||||
# --lowvram : For GPUs with <8GB VRAM (use instead of --medvram)
|
||||
# --xformers : Use xformers for additional memory optimization (requires installation)
|
||||
# --opt-sdp-attention : Alternative attention optimization
|
||||
export COMMANDLINE_ARGS="--skip-torch-cuda-test --medvram --opt-split-attention --no-half-vae"
|
||||
|
||||
# ============================================================================
|
||||
# Optional: Additional VRAM Optimization Flags
|
||||
# ============================================================================
|
||||
# Uncomment the line below for more aggressive VRAM savings:
|
||||
# export COMMANDLINE_ARGS="--skip-torch-cuda-test --medvram --opt-split-attention --no-half-vae --opt-channelslast"
|
||||
|
||||
# Uncomment for extreme low VRAM mode (<8GB):
|
||||
# export COMMANDLINE_ARGS="--skip-torch-cuda-test --lowvram --opt-split-attention --no-half-vae"
|
||||
|
||||
# ============================================================================
|
||||
# Python and Git Configuration
|
||||
# ============================================================================
|
||||
# python3 executable
|
||||
#python_cmd="python3"
|
||||
|
||||
# git executable
|
||||
#export GIT="git"
|
||||
|
||||
# python3 venv without trailing slash (defaults to ${install_dir}/${clone_dir}/venv)
|
||||
#venv_dir="venv"
|
||||
|
||||
# script to launch to start the app
|
||||
#export LAUNCH_SCRIPT="launch.py"
|
||||
|
||||
# ============================================================================
|
||||
# Package Configuration
|
||||
# ============================================================================
|
||||
# Requirements file to use for stable-diffusion-webui
|
||||
#export REQS_FILE="requirements_versions.txt"
|
||||
|
||||
# Fixed git repos
|
||||
#export K_DIFFUSION_PACKAGE=""
|
||||
#export GFPGAN_PACKAGE=""
|
||||
|
||||
# Fixed git commits
|
||||
#export STABLE_DIFFUSION_COMMIT_HASH=""
|
||||
#export CODEFORMER_COMMIT_HASH=""
|
||||
#export BLIP_COMMIT_HASH=""
|
||||
|
||||
# ============================================================================
|
||||
# Performance Tuning
|
||||
# ============================================================================
|
||||
# Uncomment to enable accelerated launch
|
||||
#export ACCELERATE="True"
|
||||
|
||||
# Uncomment to disable TCMalloc (Thread-Caching Malloc)
|
||||
# TCMalloc improves CPU memory allocation performance
|
||||
#export NO_TCMALLOC="True"
|
||||
|
||||
##########################################################################################
|
||||
# Usage Instructions:
|
||||
#
|
||||
# 1. Copy this file to webui-user.sh:
|
||||
# cp webui-user-rocm62.sh webui-user.sh
|
||||
#
|
||||
# 2. Launch the WebUI:
|
||||
# ./webui.sh
|
||||
#
|
||||
# 3. In WebUI Settings → Optimizations, configure:
|
||||
# - Enable quantization in K samplers: ✓
|
||||
# - Token merging ratio: 0.5
|
||||
# - Cross attention optimization: Doggettx (should be active)
|
||||
#
|
||||
# 4. Recommended Generation Settings for 16GB VRAM:
|
||||
#
|
||||
# Safe Mode (no errors):
|
||||
# - Size: 512x512
|
||||
# - Hires fix: OFF
|
||||
# - Batch size: 1
|
||||
#
|
||||
# Quality Mode (with upscaling):
|
||||
# - Size: 512x512
|
||||
# - Hires fix: ON
|
||||
# - Upscale by: 1.5 (not 2.0)
|
||||
# - Hires steps: 10
|
||||
# - Denoising: 0.4
|
||||
#
|
||||
# With ControlNet:
|
||||
# - Size: 512x512
|
||||
# - Hires fix: OFF
|
||||
# - ControlNet units: max 1-2 active
|
||||
# - Low VRAM mode: ON in ControlNet settings
|
||||
#
|
||||
# 5. Workflow for Best Quality:
|
||||
# Phase 1 - Generation: 512x512, no hires fix → find perfect seed/prompt
|
||||
# Phase 2 - Upscaling: Use img2img or "Send to Extras" → R-ESRGAN 4x+
|
||||
#
|
||||
##########################################################################################
|
||||
4
webui.sh
4
webui.sh
|
|
@ -153,7 +153,7 @@ case "$gpu_info" in
|
|||
*"Navi 2"*) export HSA_OVERRIDE_GFX_VERSION=10.3.0
|
||||
;;
|
||||
*"Navi 3"*) [[ -z "${TORCH_COMMAND}" ]] && \
|
||||
export TORCH_COMMAND="pip install torch torchvision --index-url https://download.pytorch.org/whl/nightly/rocm5.7"
|
||||
export TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2"
|
||||
;;
|
||||
*"Renoir"*) export HSA_OVERRIDE_GFX_VERSION=9.0.0
|
||||
printf "\n%s\n" "${delimiter}"
|
||||
|
|
@ -167,7 +167,7 @@ if ! echo "$gpu_info" | grep -q "NVIDIA";
|
|||
then
|
||||
if echo "$gpu_info" | grep -q "AMD" && [[ -z "${TORCH_COMMAND}" ]]
|
||||
then
|
||||
export TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm5.7"
|
||||
export TORCH_COMMAND="pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.2"
|
||||
elif npu-smi info 2>/dev/null
|
||||
then
|
||||
export TORCH_COMMAND="pip install torch==2.1.0 torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu; pip install torch_npu==2.1.0"
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue