Project

Project: Stable Diffusion WebUI Forge on Android (Termux/PRoot)

Project: Stable Diffusion WebUI Forge on Android (Termux/PRoot)

🌟 Project: Stable Diffusion 1.5 WebUI Forge on Android

Device: Xiaomi POCO X3 Pro (Codename: vayu) SoC: Snapdragon 860 (7nm, Kryo 485, 8 cores) GPU: Adreno 640 (Vulkan API Supported) Memory: 8GB Physical RAM + 8GB VRAM (Swap) Storage: 256GB Internal (UFS 3.1) OS: Low level debloated and hardened MIUI 14.0.3 Android 13 (Magisk Rooted, FDE.ai patched kernel, Zygisk-LsPosed) Environment: Termux + PRoot (Ubuntu 22.04 LTS Jammy)


πŸ›  Device-Level Kernel Tweaks & Optimizations

Before installing Forge, I optimized the Android runtime environment to handle the heavy load of AI generation.

πŸ”§ Kernel / System Optimizations

  • FDE.ai Kernel Tweaks:
    • Task Scheduling: Adjusted sched_min_granularity_ns & sched_wakeup_granularity_ns for smoother task scheduling under high CPU load.
    • Memory Management: Enabled zram swap to improve RAM availability during large SD model generation (~3–5 mins per 512Γ—512 image).
    • Storage I/O: Adjusted I/O scheduler to mq-deadline for faster storage access during model loads/downloads.
  • Magisk Root Access:
    • Ensures full filesystem access inside the Termux PRoot environment.
    • Allows modification of sysctl parameters for real-time memory & swap tuning.

🎨 Vulkan GPU Readiness

  • Current Status: GPU acceleration is not used in this CPU-only Forge build, but the device is Vulkan-capable.
  • Driver: Adreno 640 (Vulkan API supported).
  • Future Plan: Enable FDE.ai memory swap + Vulkan for faster image rendering. Vulkan readiness ensures a smooth fallback if GPU-based experiments are attempted later.

⚠ Note: These tweaks are optional but highly recommended to improve stability/performance under CPU-only Forge runs.


Goal: Run Stable Diffusion WebUI Forge locally on Android via Termux Final Environment: Ubuntu 22.04 LTS (PRoot) + Python 3.10 (Conda) + CPU-only PyTorch


πŸ”₯ Phase 0: Initial Setup & Ambition

  • Objective: Install AUTOMATIC1111 (A1111 WebUI)
  • Environment: Termux on Android β†’ proot-distro Ubuntu install
  • Challenge: Mobile architecture (aarch64), limited RAM, Android filesystem quirks

🚫 Phase 1: A1111 Attempt & Initial Failure

  • Problem: Default Termux distros shipped Ubuntu 24.04/25.10 β†’ Python 3.12/3.13
  • Impact: PyTorch ARM64 wheels incompatible β†’ torch not found, dependency conflicts.
  • Decision: Abandon A1111 due to bloated backend & unsolvable dependency locks.
  • Pivot: Switch to SD-WebUI Forge, better for CPU-only builds and modular environment.

⚑ Phase 2: Dependency Hell (OS & Filesystem)

❌ Failure 1: Ubuntu 25.10 & Python 3.13

  • Context: PyTorch ARM64 wheels require Python 3.10 or 3.11.
  • Attempted: Compile Python 3.10 from source.
  • Result: FAILED (missing system libraries: libbz2-dev, libsqlite3-dev).
  • βœ… Solution: Downgrade OS to Ubuntu 22.04 LTS, which natively supports Python 3.10.

❌ Failure 2: Ubuntu 22.04 Installation Hurdles

  1. Dead Links (404 Errors)

    • Issue: Standard Cloud Images were outdated or moved.
    • βœ… Solution: Use persistent releases URL: releases.ubuntu.com/22.04.
  2. Base Image Too Minimal

    • Issue: ubuntu-base-22.04.tar.gz β†’ crash (sed: can't read ./etc/locale.gen).
    • βœ… Solution: Use Ubuntu Cloud Image (ubuntu-22.04-server-cloudimg-arm64-root.tar.xz).
  3. Android Filesystem Block (Hard Links)

    • Error: Extraction fails with Cannot hard link … Permission denied.
    • βœ… Fix: Use bsdtar or wrap in proot --link2symlink to convert hard links to symlinks.
  4. Symlink Recursion Loop (Infinite Loop)

    • Error: Repacking tarball using proot --link2symlink caused Proot to try converting its own symlinks recursively.
    • βœ… Fix: Use raw bsdtar (without Proot) for repacking, while Proot is used only for extraction.
  5. Dangling symlink (/etc/resolv.conf)

    • Error: Broken symlink β†’ DNS writes fail (No Internet).
    • βœ… Fix: rm -f /etc/resolv.conf + add nameserver manually (8.8.8.8).

🐍 Phase 3: Python & PIP Conflicts

❌ Failure 3: pip inside PRoot

  • Problem: Android env variables (ANDROID_DATA) leaked into the container β†’ pip crashes.
  • βœ… Fix 1: unset ANDROID_DATA (partial workaround).
  • βœ… Fix 2: Install Miniforge (Conda) β†’ created an isolated Python 3.10 environment.

❌ Failure 4: Bad Interpreter Shebangs

  • Error: Conda executables use absolute paths β†’ bad interpreter: No such file or directory.
  • βœ… Fix: Run modules directly: use python -m pip install … instead of pip install ….

βš™οΈ Phase 4: Forge Launch Errors & CPU Patch

❌ Failure 5: Wrong Repo Near-Miss

  • Mistake: Initially cloned Stability-AI/StableStudio-Forge (This is an IDE, not the WebUI).
  • βœ… Fix: Pivoted to correct repository: lllyasviel/stable-diffusion-webui-forge.

❌ Failure 6: CUDA AssertionError

  • Crash: Forge launch crashed with: Torch not compiled with CUDA enabled.
  • Cause: The backend file memory_management.py checks for an NVIDIA GPU and panicked because it’s on a CPU-only build.

❌ Failure 7: UnboundLocalError in Patch

  • Error: First patch attempt failed with UnboundLocalError: local variable 'torch' referenced before assignment.
  • Reason: Python scope issue when patching a function.
  • βœ… Fix: Inject import torch inside the patched function scope.

βœ… Solution: The β€œLobotomy Patch”

I modified memory_management.py to trick Forge into thinking the Snapdragon CPU was a valid render device.

# The Fix for UnboundLocalError
def get_torch_device():
    import torch  # <--- Crucial import inside scope
    return "cpu"

def should_use_fp16():
    return False

# Monkey-patch CUDA check to avoid crash
torch.cuda.is_available = lambda: False

πŸ’Ύ Phase 5: Model Download & Setup

  • Target Model: SD 1.5 pruned (v1-5-pruned-emaonly.safetensors, 3.97GB).
  • Disk Check: >150GB available β†’ safe to download.
  • Download Command:
cd ~/AI/forge_webui_cpu/models/Stable-diffusion/
wget -O v1-5-pruned-emaonly.safetensors \
[https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors](https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors)
  • Download Result: 3.97GB saved in ~15m 54s at ~4.26MB/s.

πŸ§ͺ Phase 6: Testing & Validation

Launch Command for Forge:

export FORCE_CPU=1
export CUDA_VISIBLE_DEVICES=""
python launch.py --skip-torch-cuda-test --listen --lowram --no-half --precision full
  • UI: http://127.0.0.1:7860 accessible
  • CPU-only generation: ~3–5 minutes per 512Γ—512 image (Poco X3 Pro, 8GB RAM)
  • Optional: Later test fde.ai memory swap & Vulkan GPU for acceleration

πŸ“Š Phase 7: Summary of Hurdles & Solutions

HurdleError / FailureSolution
A1111 on Python 3.13Torch not foundSwitched to Forge, Python 3.10
Ubuntu 25.10PyTorch incompatibleDowngrade to 22.04 LTS
Symlink RecursionInfinite LoopUse raw bsdtar for repacking
Android Hard LinksExtraction failed--link2symlink
Dangling DNSDNS config failRemove symlink, add nameserver manually
pip inside PRootOSError / Android pathUnset ANDROID_DATA / use Miniforge Conda
Conda shebangbad interpreterUse python -m pip
Wrong RepoClone errorSwitch to lllyasviel repo
Forge CUDA checkAssertionErrorLobotomy Patch (CPU-only patch)
Patch Scope ErrorUnboundLocalErrorImport torch inside function

πŸ”₯ Phase 8: Final Success

  • OS: Ubuntu 22.04 Cloud Image (Patched)
  • Python: Miniforge 3.10 isolated environment
  • Forge: Running CPU-only, low RAM, patched for no CUDA
  • Model: SD 1.5 pruned (3.97GB) loaded successfully
  • UI: Accessible at http://127.0.0.1:7860
  • Performance: ~3–5 minutes per image, memory-safe

Note: This document tracks all 22 days of errors, fixes, downloads, and patches.