Diving Deeper, homework

Implementing negative prompts

Negative prompts are an extension of the Classifier Free Guidance Module. Recall this is part of the pred_noise method of StableDiffusion

StableDiffusion.pred_noise?

Signature: StableDiffusion.pred_noise(self, prompt_embedding, l, t, guidance_scale)
Docstring: <no docstring>
File:      ~/Desktop/SlowAI/nbs/slowai/overview.py
Type:      function

Let’s define a helper method to load StableDiffusion, as in the “Overview” notebook

source

get_stable_diffusion

 get_stable_diffusion (cls=<class 'slowai.overview.StableDiffusion'>)

source

get_simple_pipe

 get_simple_pipe ()

sd = get_stable_diffusion()

sd(
    prompt="a photo of a giraffe in Paris",
    guidance_scale=7.5,
    as_pil=True,
)

  0%|          | 0/30 [00:00<?, ?it/s]100%|██████████| 30/30 [00:04<00:00,  7.38it/s]

prompt_embedding is a tensor four-rank tensor of batch_size x seq_len x channels, where the batch size is 2 because its the concatenated unconditional prompt and the conditional prompt.

sd.embed_prompt("a photo of a giraffe in paris").shape

torch.Size([2, 77, 768])

We want to add the negative prompt and run this through the denoising unet at the same time. This should make the batch size into 3.

source

StableDiffusionWithNegativePromptA

 StableDiffusionWithNegativePromptA
                                     (tokenizer:transformers.models.clip.t
                                     okenization_clip.CLIPTokenizer, text_
                                     encoder:transformers.models.clip.mode
                                     ling_clip.CLIPTextModel,
                                     scheduler:Any, unet:diffusers.models.
                                     unets.unet_2d_condition.UNet2DConditi
                                     onModel, vae:diffusers.models.autoenc
                                     oders.autoencoder_kl.AutoencoderKL)

sd = get_stable_diffusion(StableDiffusionWithNegativePromptA)
embedding = sd.embed_prompt("a photo of a giraffe in paris", "blurry")
embedding.shape

torch.Size([3, 77, 768])

Now, we need to pretty much totally rewrite the denoising method to incorporate this negative guidance.

source

StableDiffusionWithNegativePromptB

 StableDiffusionWithNegativePromptB
                                     (tokenizer:transformers.models.clip.t
                                     okenization_clip.CLIPTokenizer, text_
                                     encoder:transformers.models.clip.mode
                                     ling_clip.CLIPTextModel,
                                     scheduler:Any, unet:diffusers.models.
                                     unets.unet_2d_condition.UNet2DConditi
                                     onModel, vae:diffusers.models.autoenc
                                     oders.autoencoder_kl.AutoencoderKL)

sd = get_stable_diffusion(StableDiffusionWithNegativePromptB)
embedding = sd.embed_prompt("a photo of a giraffe in paris", "blurry")
l = sd.init_latents()
epsilon = sd.pred_noise(embedding, l, t=0, guidance_scale_pos=7.5, guidance_scale_neg=2)
epsilon.shape

torch.Size([1, 4, 64, 64])

Finally, we incorporate the negative prompt into the class API.

source

StableDiffusionWithNegativePromptC

 StableDiffusionWithNegativePromptC
                                     (tokenizer:transformers.models.clip.t
                                     okenization_clip.CLIPTokenizer, text_
                                     encoder:transformers.models.clip.mode
                                     ling_clip.CLIPTextModel,
                                     scheduler:Any, unet:diffusers.models.
                                     unets.unet_2d_condition.UNet2DConditi
                                     onModel, vae:diffusers.models.autoenc
                                     oders.autoencoder_kl.AutoencoderKL)

sd = get_stable_diffusion(StableDiffusionWithNegativePromptC)
sd(
    prompt="a photo of a labrador dog",
    negative_prompt="park, greenery, plants, flowers",
    guidance_scale=7.5,
    neg_guidance_scale=5,
    as_pil=True,
)

  0%|          | 0/30 [00:00<?, ?it/s]100%|██████████| 30/30 [00:06<00:00,  4.98it/s]

sd = get_stable_diffusion(StableDiffusionWithNegativePromptC)
sd(
    prompt="a photo of a labrador dog in a park",
    negative_prompt="greenery, plants, flowers",
    guidance_scale=7.5,
    neg_guidance_scale=5,
    as_pil=True,
)

  0%|          | 0/30 [00:00<?, ?it/s]100%|██████████| 30/30 [00:06<00:00,  4.97it/s]