4/19/2023 0 Comments Open source nsfw image cleaner![]() ![]() The watermark estimate is from the LAION-5B metadata, the aesthetics score is estimated using the LAION-Aesthetics Predictor V2). 515k steps at resolution 512x512 on laion-aesthetics v2 5+ (a subset of laion2B-en with estimated aesthetics score > 5.0, and additionally filtered to images with an original size >= 512x512, and an estimated watermark probability < 0.5. sd-v1-2.ckpt: Resumed from sd-v1-1.ckpt.194k steps at resolution 512x512 on laion-high-resolution (170M examples from LAION-5B with resolution >= 1024x1024). sd-v1-1.ckpt: 237k steps at resolution 256x256 on laion2B-en.The team has currently published the following checkpoints: Stable Diffusion is relatively lightweight and runs on a GPU with 10GB VRAM, and even less when using float16 precision instead of the default float32. It uses a downsampling-factor 8 autoencoder with an 860M UNet and CLIP ViT-L/14 text encoder for the diffusion model. Stable Diffusion v1 was pre-trained on 256x256 images and then fine-tuned on 512x512 images, all from a subset of the LAION-5B database. Similar to Google's Imagen, Stable Diffusion uses a frozen CLIP ViT-L/14 Text Encoder. The estimated noise residual from the U-Net output is used to construct the expected denoised sample representation.ģ) Text Encoder: The text encoder is responsible for the text processing, transforming the prompt into an embedding space. VAE ArchitectureĢ) U-Net: The U-Net block, comprised of ResNet, receives the noisy sample in a lower latency space, compresses it, and then decodes it back with less noise. On inference, the denoised, generated samples undergo reverse diffusion and are transformed back to their original dimensional latent space. The encoder is used during training to convert the sample into a lower latent representation and passes it as input to the next block. For that, the authors used the VAE Architecture, which consists of two parts - encoder and decoder. It will first reduce the sample to a lower dimensional latent space. The Stable Diffusion architecture has three main components, two for reducing the sample to a lower dimensional latent space and then denoising random gaussian noise, and one for text processing.ġ) The Autoencoder: The input of the model is a random noise of the size of the desired output. Official “Stable Diffusion” release notes Stable Diffusion Architecture Stable Diffusion Architecture This means that an image of shape (3, 512, 512) becomes (3, 64, 64) in latent space, which requires 8 × 8 = 64 times less memory. For example, the autoencoder used in Stable Diffusion has a reduction factor of 8. Instead of using the actual pixel space, they applied the diffusion process over a lower dimensional latent space. To enable training on limited resources while retaining its quality and flexibility, the creators of Stable Diffusion adopted the method suggested in the paper. The forward and reverse processes require sequential repetition of thousands of steps, injecting and reducing noise, which makes the whole process slow and heavy on computational resources. This process represents data synthesis and is trained to generate data by converting random noise into realistic data. Parametrized Reverse - Undoes the forward diffusion and performs iterative denoising.This is formally achieved by a simple stochastic process that starts from a data sample and iteratively generates noisier samples using a simple Gaussian diffusion kernel.This process is used only during training and not on inference. Forward Diffusion - Maps data to noise by gradually perturbing the input data.Diffusion models have already been applied to a variety of generation tasks, such as image, speech, 3D shape, and graph synthesis. ![]() Stable Diffusion is an open source implementation of the Latent Diffusion architecture, trained to denoise random gaussian noise, in a lower dimensional latent space, to get a sample of interest.ĭiffusion models are trained to predict a way to slightly denoise a sample in each step, and after a few iterations, a result is obtained. Let's dive into the details, and check what Stable Diffusion has in store for the data science community! Introducing Stable Diffusion Created by the researchers and engineers from Stability AI, CompVis, and LAION, “Stable Diffusion” claims the crown from Craiyon, formerly known as DALL♾-Mini, to be the new state-of-the-art, text-to-image, open-source model.Īlthough generating images from text already feels like ancient technology, Stable Diffusion manages to bring innovation to the table, which is even more surprising given that it's an open-source project. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |