本文记录使用 sd-scripts 与 RTX4090 训练 Flux Lora 流程。
工具配置
使用著名的 sd-scripts 脚本进行训练,现时需要 sd3 分支支持,使用以下命令:
git clone https://github.com/kohya-ss/sd-scripts.git
cd sd-scripts
git checkout sd3
python3.10 -m venv venv
source venv/bin/activate
pip install -i https://mirrors.aliyun.com/pypi/simple --extra-index-url https://download.pytorch.org/whl/cu124 --default-timeout=100 -r requirements.txt
# 未兼容新版 torch
pip install -i https://mirrors.aliyun.com/pypi/simple --extra-index-url https://download.pytorch.org/whl/cu124 --default-timeout=100 triton torch==2.4.0 torchvision==0.19.0
配置 accelerate
accelerate config
启用 numa efficiency, 并且选择 bf16,生成的 ~/.cache/huggingface/accelerate/default_config.yaml 内容
compute_environment: LOCAL_MACHINE
debug: false
distributed_type: 'NO'
downcast_bf16: 'no'
enable_cpu_affinity: true
gpu_ids: all
machine_rank: 0
main_training_function: main
mixed_precision: bf16
num_machines: 1
num_processes: 1
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false
数据集准备
准备若干 jpg、png 图片,以及同名但后缀为 .txt 的图片内容描述文件,建议使用脚本通过 ChatGPT 4o api 生成,然后配置 dataset.toml
[[datasets]]
enable_bucket = true
resolution = [350, 350]
bucket_reso_steps = 64
max_bucket_reso = 2048
min_bucket_reso = 128
bucket_no_upscale = false
batch_size = 1
random_crop = false
shuffle_caption = false
[[datasets.subsets]]
image_dir = "/home/yplam/AI/sd-scripts/data/dataset"
num_repeats = 1
caption_extension = ".txt
训练
需要注意的是训练前需下载对应的模型权重、clip、ae 等,通常如果有用 ComfyUI 等工具则在 models 下可以找到
accelerate launch --mixed_precision bf16 --num_cpu_threads_per_process 1 flux_train_network.py \
--pretrained_model_name_or_path /home/yplam/AI/ComfyUI-Docker/models/unet/flux1-dev-fp8.safetensors \
--clip_l /home/yplam/AI/ComfyUI-Docker/models/clip/clip_l.safetensors \
--ae /home/yplam/AI/ComfyUI-Docker/models/vae/ae.safetensors \
--cache_latents_to_disk \
--save_model_as safetensors \
--sdpa \
--persistent_data_loader_workers \
--max_data_loader_n_workers 2 \
--seed 42 \
--gradient_checkpointing \
--mixed_precision bf16 \
--save_precision bf16 \
--network_module networks.lora_flux \
--network_dim 16 \
--network_alpha 8 \
--optimizer_type adamw8bit \
--learning_rate 2e-4 \
--lr_scheduler constant_with_warmup \
--lr_warmup_steps 20 \
--cache_text_encoder_outputs \
--cache_text_encoder_outputs_to_disk \
--fp8_base \
--max_train_steps 1200 \
--save_every_n_epochs 10 \
--dataset_config /home/yplam/AI/sd-scripts/data/dataset.toml \
--output_dir /home/yplam/AI/sd-scripts/data/output \
--output_name avatarbeta \
--timestep_sampling shift \
--discrete_flow_shift 3.1582 \
--model_prediction_type raw \
--guidance_scale 1.0 \
--t5xxl /home/yplam/AI/ComfyUI-Docker/models/clip/t5xxl_fp16.safetensors \
--split_mode \
--network_args "train_blocks=single" \
--max_grad_norm 1.0 \
--gradient_accumulation_steps 4 \
--clip_skip 2 \
--min_snr_gamma 5 \
--noise_offset 0.1