稳定扩散版本2

文本转图像示例

此仓库包含从头开始训练的Stable Diffusion模型，并将持续更新新的检查点。以下是当前可用模型的概述。更多内容即将发布。

要求

您可以通过运行以下命令来更新现有的潜在扩散环境：

  [plaintext]
1
2
3
conda install pytorch==1.12.1 torchvision==0.13.1 -c pytorch
pip install transformers==4.19.2 diffusers invisible-watermark
pip install -e .

xformers高效注意

为了提高GPU上的效率和速度，我们强烈推荐安装xformers库。

已在带CUDA 11.4的A100上测试。安装需要较新的nvcc和gcc/g++版本，可以通过以下命令获得：

export CUDA_HOME=/usr/local/cuda-11.4
conda install -c nvidia/label/cuda-11.4.0 cuda-nvcc
conda install -c conda-forge gcc
conda install -c conda-forge gxx_linux-64==9.5.0

然后，运行以下命令（编译可能需要长达30分钟）。

cd ..
git clone https://github.com/facebookresearch/xformers.git
cd xformers
git submodule update --init --recursive
pip install -r requirements.txt
pip install -e .
cd ../stablediffusion

成功安装后，代码将在U-Net和自动编码器中的自注意和交叉注意层自动使用内存高效注意。

一般免责声明

Stable Diffusion模型是通用的文本到图像扩散模型，因此反映了其训练数据中的偏见和（误）概念。尽管已经努力减少明确的色情内容，但我们不推荐在没有额外安全机制和考虑的情况下将提供的权重用于服务或产品。权重是研究成果，应如此对待。 有关训练过程和数据的详细信息，以及模型的预期用途，请参见相应的模型卡。权重可通过Hugging Face上的StabilityAI组织根据CreativeML Open RAIL++-M License获得。

稳定扩散v2

稳定扩散v2指的是使用下采样因子8自动编码器、865M UNet和OpenCLIP ViT-H/14文本编码器的特定模型配置。_SD 2-v_模型生成768x768像素的输出。

在不同的无分类器指导比例（1.5, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0）和50个DDIM采样步骤下的评估显示了检查点的相对改进：

稳定扩散评估结果

文本到图像

![文本到图像示例](assets/stable-samples/txt2img/merged-

0023.png) 文本到图像示例

用法

以下展示了如何使用diffusers库从_SF 2.0-v_模型中采样：

  [python]
1
2
3
4
5
6
7
8
9
10
11
12
import torch
from diffusers import StableDiffusionPipeline

model_id = "stabilityai/stable-diffusion-2"
device = "cuda"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to(device)

prompt = "the dog is walking"
image = pipe(prompt).images[0]

image.save("dog.png")

或者，可以使用我们提供的直接脚本：

python scripts/txt2img.py --prompt "a beautiful cat" --H 768 --W 768 --seed 27 --n_samples 1 --n_iter 1

深度条件稳定扩散

深度条件图像示例

Stable Diffusion 2 depth-conditioned model对形状和深度信息敏感。其使用方式与文本到图像模型相似，不同的是需要额外的单目深度图。我们提供了用于生成这种深度图的代码片段，可以集成到推理管道中。

  [python]
1
2
3
4
5
6
7
8
9
10
11
12
13
import torch
from diffusers import StableDiffusionDepth2ImgPipeline

model_id = "stabilityai/stable-diffusion-2-depth"
device = "cuda"
pipe = StableDiffusionDepth2ImgPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to(device)

prompt = "a luxurious mansion"
depth_map = get_depth_map("input_image.jpg")  # 生成深度图的函数
image = pipe(prompt, depth_map).images[0]

image.save("mansion.png")

图片修复

图片修复示例

Stable Diffusion 2.0 inpainting model 是一个专门用于图片修复任务的模型。其使用方法与其他模式相似，但需要额外的掩码信息。我们提供了生成这种掩码的代码片段，可以集成到推理管道中。

  [python]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
import torch
from diffusers import StableDiffusionInpaintPipeline

model_id = "stabilityai/stable-diffusion-2-inpainting"
device = "cuda"
pipe = StableDiffusionInpaintPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to(device)

prompt = "a beautiful beach"
init_image = "path_to_image.png"  # 初始图像
mask_image = "path_to_mask.png"   # 掩码图像
image = pipe(prompt, init_image=init_image, mask_image=mask_image).images[0]

image.save("beach.png")

图像上采样

图像上采样示例

用法

  [python]
1
2
3
4
5
6
7
8
9
10
11
12
13
import torch
from diffusers import StableDiffusionUpscalePipeline

model_id = "stabilityai/stable-diffusion-x4-upscaler"
device = "cuda"
pipe = StableDiffusionUpscalePipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to(device)

prompt = "an ultra high resolution photo of a cat"
low_res_image = get_low_res_image("path_to_image.png")  # 生成低分辨率图像的函数
image = pipe(prompt, image=low_res_image).images[0]

image.save("high_res_cat.png")

参考

相关论文：高分辨率图像合成与潜在扩散模型
代码库：https://github.com/CompVis/stable-diffusion
模型权重：Hugging Face上的StabilityAI组织

致谢

我们对以下人和组织表示感谢：

更多信息和更新，请访问我们的GitHub页面和Hugging Face页面。

快速入门

Stable Diffusion是一个强大的工具，可以用来生成高质量的图像。以下是一个快速入门指南：

安装必要的依赖库。
下载并加载模型。
输入提示词生成图像。

更多详细信息，请参考上述每个部分的使用说明。祝您使用愉快！

参考资料

https://github.com/Stability-AI/stablediffusion/blob/main/README.md

稳定扩散版本2
参考资料

Stable Diffusion-01-入门概览