CS 280A Project 5 : The Power of Diffusion Models! by Phudish Prateepamornkul

Part 0: Setup

We start first with some setup where we are using the DeepFloyd IF diffusion model and we are showing the 3 text prompts and display the caption and the output of the model. We have that the first set of image has the number of inference steps are 20

First Example from model
First Example from model with the following prompt an oil painting of a snowy mountain village with number of inference steps are 20 and 64x64
Second Example Second Image
Second Example from model with the following prompt a man wearing a hat with number of inference steps are 20 and 64x64
Third Example Second Image
Third Example from model with the following prompt a rocket ship with number of inference steps are 20 and 64x64
First Example from model
First Example from model with the following prompt an oil painting of a snowy mountain village with number of inference steps are 20 and 256x256
Second Example Second Image
Second Example from model with the following prompt a man wearing a hat with number of inference steps are 20 and 256x256
Third Example Second Image
Third Example from model with the following prompt a rocket ship with number of inference steps are 20 and 256x256
First Example from model
First Example from model with the following prompt an oil painting of a snowy mountain village with number of inference steps are 40 and 64x64
Second Example Second Image
Second Example from model with the following prompt a man wearing a hat with number of inference steps are 40 and 64x64
Third Example Second Image
Third Example from model with the following prompt a rocket ship with number of inference steps are 40 and 64x64
First Example from model
First Example from model with the following prompt an oil painting of a snowy mountain village with number of inference steps are 40 and 256x256
Second Example Second Image
Second Example from model with the following prompt a man wearing a hat with number of inference steps are 40 and 256x256
Third Example Second Image
Third Example from model with the following prompt a rocket ship with number of inference steps are 40 and 256x256
First Example from model
First Example from model with the following prompt an oil painting of a snowy mountain village with number of inference steps are 10 and 64x64
Second Example Second Image
Second Example from model with the following prompt a man wearing a hat with number of inference steps are 10 and 64x64
Third Example Second Image
Third Example from model with the following prompt a rocket ship with number of inference steps are 10 and 64x64
First Example from model
First Example from model with the following prompt an oil painting of a snowy mountain village with number of inference steps are 10 and 256x256
Second Example Second Image
Second Example from model with the following prompt a man wearing a hat with number of inference steps are 10 and 256x256
Third Example Second Image
Third Example from model with the following prompt a rocket ship with number of inference steps are 10 and 256x256
From this we see that as the number of inference steps increase the image seems to be sharper. The seed I am setting is 180.

Part 1: Sampling Loops

1.1 Implementing the Forward Process

The first step I did is to write the forward process where we use the equation that was given to us sepcifically I use the following equation

\[ x_t = \sqrt{\bar{\alpha}_t} x_0 + \sqrt{1 - \bar{\alpha}_t} \epsilon \quad \text{where} \quad \epsilon \sim \mathcal{N}(0, 1) \]

which is essentially given a clean image \( x_0 \) we will get a noisy image \( x_t \) at time step \( t \) by sampling from a Gaussian noise with mean \( \sqrt{\bar{\alpha}_t} \) and variance \( \sqrt{1 - \bar{\alpha}_t} \). We implement the forward(im,t) function and the result below show the test image at noise level [0, 250, 500, 750]
Original Campnile
Original Berkeley Campnile
Noisy Image at t=250
Noisy Campanile image at t = 250
Noisy Image at t = 500
Noisy Campanile image at t = 500
Noisy Image at t = 750
Noisy Campanile image at t = 750

1.2 Classical Denoising

Given now that we have the noisy images we want to try to denoise these images the first thing we can try is to use the Gaussian blur filtering (we used the kernel size to be 13 with sigma to be 2) to try to remove the noise the result below shows the result of applying gaussian blur at each time step [250, 500, 750]

Noisy Image at t=250
Noisy Campanile image at t = 250
Noisy Image at t = 500
Noisy Campanile image at t = 500
Noisy Image at t = 750
Noisy Campanile image at t = 750
Gaussian Blur Denoising at t = 250
Gaussian Blur Denoising at t = 250
Gaussian Blur Denoising at t = 500
Gaussian Blur Denoising at t = 500
Gaussain Blur Denoising at t = 750
Gaussian Blur Denoising at t = 750

As we can see from this that this is not looking good at all so we will try to use other method instead

1.3 One-Step Denoising

We will use the pretraiend diffusion model to denoise the image we can use this to recover the Gaussian noise from the image the image below show the result of the 3 noisy image at time step [250, 500, 750] and also the one-step denoised for each of the image

Noisy Image at t=250
Noisy Campanile image at t = 250
Noisy Image at t = 500
Noisy Campanile image at t = 500
Noisy Image at t = 750
Noisy Campanile image at t = 750
One-Step Denoised Campanile at t = 250
One-Step Denoised Campanile at t = 250
One-Step Denoised Campanile at t = 500
One-Step Denoised Campanile at t = 500
One-Step Denoised Campanile at t = 750
One-Step Denoised Campanile at t = 750

Again this look not the best however, it is still better than the gaussian blur earlier that we did now the next section we will do the iterative denoising

1.4 Iterative Denoising

Instead of doing one step denoising we can do a much better job by denoising each step and get the clear image at the end. To do this we first need to create a list of timesteps that we will call it as stided_timesteps where the first item in the list correspond to the noisiest image and then the last timesteps is the one that is a clean image so we used the stided_timesteps from T = 990 to T = 0 in steps of 30. Once we do that then for each of the timestep we use the following formula and we implement the function iterative_denois(image, i_start)

\[ x_t = \frac{\sqrt{\bar{\alpha}_t \beta_t}}{1 - \bar{\alpha}_t} x_0 + \frac{\sqrt{\alpha_t (1 - \bar{\alpha}_t)}}{1 - \bar{\alpha}_t} x_t + v_\sigma \]

Where:

We started first with i_start = 10 and the images below show the result of the noisy image of 5th loop of denoising and the final predicted clean image using iterative denoising, predicted clean image using single denoising step annd predicted clean image using gaussing bluring

Noisy Image at t=136
Noisy Campanile image at t = 136
Noisy Image at t = 238
Noisy Campanile image at t = 238
Noisy Image at t = 307
Noisy Campanile image at t = 307
Noisy Image at t = 477
Noisy Campanile image at t = 477
Noisy Image at t = 648
Noisy Campanile image at t = 648
Original Image
Original Image
Iterative Denoised Campanile
Iterative Denoised Campanile
One Step Denoised Campanile
One Step Denoised Campanile
Gaussian Blurred Campanile
Gaussain Blurred Campanile

1.5 Diffusion Model Sampling

Once we get everything working now we can started to do some sampling image completely from the noise that is we set the i_start to be 0 the images below some sample 5 images

Sample 1
Sample 1
Sample 2
Sample 2
Sample 3
Sample 3
Sample 4
Sample 4
Sample 5
Sample 5
Sample 6
Sample 6

1.6 Classifier-Free Guidance

One thing we note here is that the image generated is not very good so we have that we can do better by applying the classifier free guidance that is we compute the noise estimage of both a conditional and unconditional and then we denote the final noise as where \( \epsilon_{c} \) denote the epsilon noised estimated conditional and \( \epsilon_{u} \) denote the epsilon noised estimated unconditional

\[ \epsilon = \epsilon_{u} + \gamma (\epsilon_{c} - \epsilon_{u}) \]

We implemented the iterative denoise cfg function and using the scale factor of 7 and show the images of 5 sample the images are below

Sample 1 with CFG
Sample 1 with CFG
Sample 2 with CFG
Sample 2 with CFG
Sample 3 with CFG
Sample 3 with CFG
Sample 4 with CFG
Sample 4 with CFG
Sample 5 with CFG
Sample 5 with CFG
Sample 6 with CFG
Sample 6 with CFG

1.7 Image-to-Image Translation

From the previous part we have that we take a real image and we add noise to it and then denoise but this part we will try to take the original image noise a little bit and force it back onto the image manifold without any conditioning. We use the iterative_denoise_cfg function using a starting index of [1, 3, 5,7,10,20] steps and show the results

SDEdit with i_start = 1
First example of SDEdit with i_start = 1
SDEdit with i_start = 3
First example of SDEdit with i_start = 3
Sample 3 with CFG
First example of SDEdit with i_start = 5
Sample 4 with CFG
First example of SDEdit with i_start = 7
Sample 5 with CFG
First example of SDEdit with i_start = 10
Sample 6 with CFG
First example of SDEdit with i_start = 20
Campanile
Campanile
SDEdit with i_start = 1
Second example of SDEdit with i_start = 1
SDEdit with i_start = 3
Second example of SDEdit with i_start = 3
Sample 3 with CFG
Second example of SDEdit with i_start = 5
Sample 4 with CFG
Second example of SDEdit with i_start = 7
Sample 5 with CFG
Second example of SDEdit with i_start = 10
Sample 6 with CFG
Second example of SDEdit with i_start = 20
Campanile
Capybara
SDEdit with i_start = 1
Third example of SDEdit with i_start = 1
SDEdit with i_start = 3
Third example of SDEdit with i_start = 3
Sample 3 with CFG
Third example of SDEdit with i_start = 5
Sample 4 with CFG
Third example of SDEdit with i_start = 7
Sample 5 with CFG
Third example of SDEdit with i_start = 10
Sample 6 with CFG
Third example of SDEdit with i_start = 20
Campanile
Golden Gate

Editing Hand-Drawn and Web Image

Now we will try to use it to the web images and also the Hand-Drawn the images below show the result
SDEdit with i_start = 1
Moodeng at i_start = 1
SDEdit with i_start = 1
Moodeng at i_start = 3
SDEdit with i_start = 1
Moodeng at i_start = 5
SDEdit with i_start = 1
Moodeng at i_start = 7
SDEdit with i_start = 1
Moodeng at i_start = 10
SDEdit with i_start = 1
Moodeng at i_start = 20
SDEdit with i_start = 1
Moodeng
SDEdit with i_start = 1
House at i_start = 1
SDEdit with i_start = 1
House at i_start = 3
SDEdit with i_start = 1
House at i_start = 5
SDEdit with i_start = 1
House at i_start = 7
SDEdit with i_start = 1
House at i_start = 10
SDEdit with i_start = 1
House at i_start = 20
SDEdit with i_start = 1
House
SDEdit with i_start = 1
Crab at i_start = 1
SDEdit with i_start = 1
Crab at i_start = 3
SDEdit with i_start = 1
Crab at i_start = 5
SDEdit with i_start = 1
Crab at i_start = 7
SDEdit with i_start = 1
Crab at i_start = 10
SDEdit with i_start = 1
Crab at i_start = 20
SDEdit with i_start = 1
Crab

1.7.2 Inpainting

We then apply the above proces such that we have an original image that is \( x_{orig} \) and a binary mask \( m \) then w can create a new image that has the same content whenever the mask is 0 and the new content whenever the mask is 1. This is done by using the normal diffusion step that we did however at every step we also have the following \[ x_t = mx_{t} + (1 - m)forward(x_{orgin},t) \] The images below show the result of the inpainting of the 3 images.

SDEdit with i_start = 1
Campanile
SDEdit with i_start = 1
First Mask
SDEdit with i_start = 1
First Hole to Fill
SDEdit with i_start = 1
Campanile Inpainted
SDEdit with i_start = 1
Beach Image
SDEdit with i_start = 1
Second Mask
SDEdit with i_start = 1
Second Hole to Fill
SDEdit with i_start = 1
Beach Inpainted
SDEdit with i_start = 1
BigBen Image
SDEdit with i_start = 1
Third Mask
SDEdit with i_start = 1
Third Hole to Fill
SDEdit with i_start = 1
Beach Inpainted

1.7.3 Text-Conditional Image-to-Image Translation

The next thing we can do is that instead of putting the text prompt of "a high quality photo" we can change our text prompt so that the image will generated from that text prompt and look similar to the original when we increase the noise level. The images below show the result where the first image we used the text prompt "a rocket ship", second image we used "robot head", and the third image we used "a strawberry".

SDEdit with i_start = 1
Rocket Ship at noise level 1
SDEdit with i_start = 1
Rocket Ship at noise level 3
SDEdit with i_start = 1
Rocket Ship at noise level 5
SDEdit with i_start = 1
Rocket Ship at noise level 7
SDEdit with i_start = 1
Rocket Ship at noise level 10
SDEdit with i_start = 1
Rocket Ship at noise level 20
SDEdit with i_start = 1
Campanile
SDEdit with i_start = 1
Robot Face at noise level 1
SDEdit with i_start = 1
Robot Face at noise level 3
SDEdit with i_start = 1
Robot Face at noise level 5
SDEdit with i_start = 1
Robot Face at noise level 7
SDEdit with i_start = 1
Robot Face at noise level 10
SDEdit with i_start = 1
Robot Face at noise level 20
SDEdit with i_start = 1
Human Head
SDEdit with i_start = 1
Strawberry at noise level 1
SDEdit with i_start = 1
Strawberry at noise level 3
SDEdit with i_start = 1
Strawberry at noise level 5
SDEdit with i_start = 1
Strawberry at noise level 7
SDEdit with i_start = 1
Strawberry at noise level 10
SDEdit with i_start = 1
Strawberry at noise level 20
SDEdit with i_start = 1
Apple

1.8 Visual Anagrams

Now we can do something that is very cool that is we can use our diffusion model to make the visual anagrams that is when we look at the image we can see one thing and when we flipped the image then we will see another thing. This is done by the following method we will denoise image \( x_{t} \) at step \( t \) and then we use the first prompt to obtain the noise \( \epsilon_{1} \) and then we flip the \(x_{t} \) and use the second prompt to obtain the noise \( \epsilon_{2} \) and then we can flip \(\epsilon_{2}\) and then we can average the noise to get what we want. \[ \begin{align} \epsilon_{1} &= \text{UNet}(x_{t}, t, \text{prompt}_1) \\ \epsilon_{2} &= \text{flip}(\text{UNet}(\text{flip}(x_{t}), t, \text{prompt}_2)) \\ \epsilon &= \frac{\epsilon_{1} + \epsilon_{2}}{2} \end{align} \]

SDEdit with i_start = 1
An Oil Painting of an Old Man
SDEdit with i_start = 1
An Oil Painting of People around a Campfire
SDEdit with i_start = 1
A sleeping cat
SDEdit with i_start = 1
A mountain landscape
SDEdit with i_start = 1
An oil painting of a snowy mountain village
SDEdit with i_start = 1
An oil painting of penguin

1.9 Hybrid Images

The last thing we can do is that we can create the hybrid images just like in project 2. In order to this we can do something that is very similar to the previous part that is we are doing the following \[ \begin{align} \epsilon_{1} &= \text{UNet}(x_{t}, t, \text{prompt}_1) \\ \epsilon_{2} &= \text{UNet}(x_{t}, t, \text{prompt}_2) \\ \epsilon &= f_{\text{lowpass}}(\epsilon_{1}) + f_{\text{highpass}}(\epsilon_{2}) \end{align}] where we have that that the we use the gaussian filter to get the low pass filter and high pass filter with kernel size of 33 and sigma of 2. The images below show some of the result of the hybrid images. We get the inspiration of the caption from here For the first set of images we use the following prompts "a lithograph of a skull" and "a lithograph of waterfalls", for the second set of images we use the following prompts "a lithograph of a pig" and "a lithograph of waterfalls", for the third set of images we use the following prompts "a lithograph of a panda" and "a lithograph of flowers".

SDEdit with i_start = 1
Hybrid Image of a skull and waterfall
SDEdit with i_start = 1
Hybrid Image of a skull and waterfall
SDEdit with i_start = 1
Hybrid Image of a pig and waterfall
SDEdit with i_start = 1
Hybrid Image of a panda and flower
SDEdit with i_start = 1
Hybrid Image of a panda and flower

Part B: Diffusion Models from Scratch!

1.2 Using the UNet to Train a Denoiser

Now that we already have all the operations needed for implement UNet. We can use that to train a denoiser \(D_{\theta} \) such that it denoise the noisy image \(z \) and we want to get the clean image (\(x \)). We can do this by using the L2 loss and we have that \[ z = x + \sigma \epsilon, where \quad \epsilon \sim N(0,1) \] The images below the diferent denoising processes from \( \sigma = 0.0, 0.2, 0.4, 0.6, 0.8, 1.0 \).

SDEdit with i_start = 1
First example of image at sigma level is 0.0
SDEdit with i_start = 1
First example of image at sigma level is 0.2
SDEdit with i_start = 1
First example of image at sigma level is 0.4
SDEdit with i_start = 1
First example of image at sigma level is 0.5
SDEdit with i_start = 1
First example of image at sigma level is 0.6
SDEdit with i_start = 1
First example of image at sigma level is 0.8
SDEdit with i_start = 1
First example of image at sigma level is 1.0
SDEdit with i_start = 1
Second example of image at sigma level is 0.0
SDEdit with i_start = 1
Second example of image at sigma level is 0.2
SDEdit with i_start = 1
Second example of image at sigma level is 0.4
SDEdit with i_start = 1
Second example of image at sigma level is 0.5
SDEdit with i_start = 1
Second example of image at sigma level is 0.6
SDEdit with i_start = 1
Second example of image at sigma level is 0.8
SDEdit with i_start = 1
Second example of image at sigma level is 1.0
SDEdit with i_start = 1
Third example of image at sigma level is 0.0
SDEdit with i_start = 1
Third example of image at sigma level is 0.2
SDEdit with i_start = 1
Third example of image at sigma level is 0.4
SDEdit with i_start = 1
Third example of image at sigma level is 0.5
SDEdit with i_start = 1
Third example of image at sigma level is 0.6
SDEdit with i_start = 1
Third example of image at sigma level is 0.8
SDEdit with i_start = 1
Third example of image at sigma level is 1.0
SDEdit with i_start = 1
Fourth example of image at sigma level is 0.0
SDEdit with i_start = 1
Fourth example of image at sigma level is 0.2
SDEdit with i_start = 1
Fourth example of image at sigma level is 0.4
SDEdit with i_start = 1
Fourth example of image at sigma level is 0.5
SDEdit with i_start = 1
Fourth example of image at sigma level is 0.6
SDEdit with i_start = 1
Fourth example of image at sigma level is 0.8
SDEdit with i_start = 1
Fourth example of image at sigma level is 1.0
SDEdit with i_start = 1
Fifth example of image at sigma level is 0.0
SDEdit with i_start = 1
Fifth example of image at sigma level is 0.2
SDEdit with i_start = 1
Fifth example of image at sigma level is 0.4
SDEdit with i_start = 1
Fifth example of image at sigma level is 0.5
SDEdit with i_start = 1
Fifth example of image at sigma level is 0.6
SDEdit with i_start = 1
Fifth example of image at sigma level is 0.8
SDEdit with i_start = 1
Fifth example of image at sigma level is 1.0

1.2.1 Training

We then use the dataset and train our U-network for 5 epochs with the batch size of 256, hidden size of 128, and learning rate of 1e-4. We also use the sigma to be 0.5 and we shows the train losses for mini batch in every epoch so that is every batch size of 256. The images below show the training loss curve. We also show some sample of denosing at epoch 1 and epoch 5.

SDEdit with i_start = 1
Training Loss Curve
SDEdit with i_start = 1
Epoch 1 test set first example at sigma = 0
SDEdit with i_start = 1
Epoch 1 test set first example at sigma = 0.5
SDEdit with i_start = 1
Epoch 1 test set first example output
SDEdit with i_start = 1
Epoch 1 test set second example at sigma = 0
SDEdit with i_start = 1
Epoch 1 test set second example at sigma = 0.5
SDEdit with i_start = 1
Epoch 1 test set second example output
SDEdit with i_start = 1
Epoch 1 test set third example at sigma = 0
SDEdit with i_start = 1
Epoch 1 test set third example at sigma = 0.5
SDEdit with i_start = 1
Epoch 1 test set third example output
SDEdit with i_start = 1
Epoch 5 test set first example at sigma = 0
SDEdit with i_start = 1
Epoch 5 test set first example at sigma = 0.5
SDEdit with i_start = 1
Epoch 5 test set first example output
SDEdit with i_start = 1
Epoch 5 test set second example at sigma = 0
SDEdit with i_start = 1
Epoch 5 test set second example at sigma = 0.5
SDEdit with i_start = 1
Epoch 5 test set second example output
SDEdit with i_start = 1
Epoch 5 test set third example at sigma = 0
SDEdit with i_start = 1
Epoch 5 test set third example at sigma = 0.5
SDEdit with i_start = 1
Epoch 5 test set third example output

1.2.2 Out-of-Distribution Testing

Once we are done with training we can use our model to do out-of-distribution testing. That is we used the trained model with noise 0.5 to denoise the images with noise level 0.0, 0.2, 0.4, 0.6, 0.8, 1.0.

SDEdit with i_start = 1
Noisy image with sigma = 0.0
SDEdit with i_start = 1
Noisy image with sigma = 0.2
SDEdit with i_start = 1
Noisy image with sigma = 0.4
SDEdit with i_start = 1
Noisy image with sigma = 0.5
SDEdit with i_start = 1
Noisy image with sigma = 0.6
SDEdit with i_start = 1
Noisy image with sigma = 0.8
SDEdit with i_start = 1
Noisy image with sigma = 1.0
SDEdit with i_start = 1
Denoised output image with sigma = 0.0
SDEdit with i_start = 1
Denoised output image with sigma = 0.2
SDEdit with i_start = 1
Denoised output image with sigma = 0.4
SDEdit with i_start = 1
Denoised output image with sigma = 0.5
SDEdit with i_start = 1
Denoised output image with sigma = 0.6
SDEdit with i_start = 1
Denoised output image with sigma = 0.8
SDEdit with i_start = 1
Denoised output image with sigma = 1.0

Part 2

2.2 Training the UNet and 2.3 Sampling from the UNet

Once we defined the newly FCBlock and as we can see that the one step denoising might not be the best so w will try to implent to denoise the image iteratively the first thing we do is to add the time condition and use that to train our model. Since I finished this befroe the spec got change so my hyperparameter is the same as the previous part (that is batch size of 256, hidden size of 128, and learning rate of 1e-4) and also the loss present here is the train losses in every mini batch_size in every epoch that is every 256.

SDEdit with i_start = 1
Time-Conditioned UNet training loss curve

Epoch 1 sample time condition

SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1

Epoch 5 sample time condition

SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1

Epoch 10 sample time condition

SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1

Epoch 15 sample time condition

SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1

Epoch 20 sample time condition

SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1

2.4 Adding Class-Conditiong to UNet and 2.5 Sampling from the Class-Conditioned UNet

SDEdit with i_start = 1
Class-Conditioned UNet training loss curve

Epoch 1 sample class condition

SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1

Epoch 5 sample class condition

SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1

Epoch 10 sample class condition

SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1

Epoch 15 sample class condition

SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1

Epoch 20 sample class condition

SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1
SDEdit with i_start = 1