Convert Figma logo to code with AI

JingyunLiang logoSwinIR

SwinIR: Image Restoration Using Swin Transformer (official repository)

4,331
531
4,331
60

Top Related Projects

Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.

2,875

Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

1,189

Designing a Practical Degradation Model for Deep Blind Image Super-Resolution (ICCV, 2021) (PyTorch) - We released the training code!

Quick Overview

SwinIR is a deep learning-based image restoration model that utilizes the Swin Transformer architecture. It is designed for various image restoration tasks, including super-resolution, denoising, and JPEG compression artifact reduction. SwinIR demonstrates state-of-the-art performance on these tasks while maintaining efficiency.

Pros

  • Achieves superior performance on multiple image restoration tasks
  • Utilizes the powerful Swin Transformer architecture for efficient processing
  • Provides pre-trained models for easy implementation
  • Supports both color and grayscale image processing

Cons

  • Requires significant computational resources for training and inference
  • May have a steeper learning curve for users unfamiliar with transformer architectures
  • Limited documentation for customization and fine-tuning
  • Dependency on specific versions of PyTorch and other libraries

Code Examples

  1. Loading a pre-trained SwinIR model:
from models.network_swinir import SwinIR

model = SwinIR(upscale=4, in_chans=3, img_size=64, window_size=8,
               img_range=1., depths=[6, 6, 6, 6], embed_dim=60, num_heads=[6, 6, 6, 6],
               mlp_ratio=2, upsampler='pixelshuffledirect', resi_connection='1conv')
model.load_state_dict(torch.load('model_zoo/swinir/001_classicalSR_DIV2K_s48w8_SwinIR-M_x4.pth')['params'])
  1. Performing super-resolution on an image:
from utils import util_calculate_psnr_ssim as util

img_lq = util.imread_uint(args.folder_lq, n_channels=3)
img_lq = util.uint2tensor4(img_lq)
img_lq = img_lq.to(device)

with torch.no_grad():
    output = model(img_lq)

output = util.tensor2uint(output)
util.imsave(output, os.path.join(args.folder_results, img_name))
  1. Calculating PSNR and SSIM metrics:
from utils import util_calculate_psnr_ssim as util

img_gt = util.imread_uint(args.folder_gt, n_channels=3)
img_restored = util.imread_uint(os.path.join(args.folder_results, img_name), n_channels=3)

psnr = util.calculate_psnr(img_restored, img_gt, crop_border=args.crop_border)
ssim = util.calculate_ssim(img_restored, img_gt, crop_border=args.crop_border)

Getting Started

  1. Clone the repository:

    git clone https://github.com/JingyunLiang/SwinIR.git
    cd SwinIR
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Download pre-trained models from the provided links in the repository.

  4. Run inference:

    python main_test_swinir.py --task classical_sr --scale 4 --model_path model_zoo/swinir/001_classicalSR_DIV2K_s48w8_SwinIR-M_x4.pth --folder_lq testsets/Set5/LR_bicubic/X4 --folder_gt testsets/Set5/HR
    

Competitor Comparisons

Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.

Pros of Real-ESRGAN

  • Focuses on practical applications, particularly enhancing real-world images
  • Implements a U-Net-based generator with spectral normalization for improved stability
  • Utilizes a large-scale dataset with diverse degradations for better generalization

Cons of Real-ESRGAN

  • May introduce artifacts in some cases, especially with extreme upscaling factors
  • Requires more computational resources due to its complex architecture
  • Can sometimes over-smooth textures, leading to loss of fine details

Code Comparison

Real-ESRGAN:

model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32)

SwinIR:

model = SwinIR(upscale=4, in_chans=3, img_size=64, window_size=8,
               img_range=1., depths=[6, 6, 6, 6], embed_dim=60, num_heads=[6, 6, 6, 6],
               mlp_ratio=2, upsampler='pixelshuffledirect')

Both repositories focus on image super-resolution, but Real-ESRGAN emphasizes real-world applications, while SwinIR leverages the Swin Transformer architecture for improved performance. Real-ESRGAN may be more suitable for general-purpose upscaling, while SwinIR might offer better results in specific scenarios where preserving fine details is crucial.

2,875

Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

Pros of KAIR

  • Comprehensive collection of image restoration models and techniques
  • Extensive documentation and tutorials for various tasks
  • Flexible architecture allowing easy integration of new models

Cons of KAIR

  • Larger codebase, potentially more complex to navigate
  • May require more computational resources due to its comprehensive nature

Code Comparison

KAIR:

from models.network_swinir import SwinIR as net
model = net(upscale=2, in_chans=3, img_size=64, window_size=8,
            img_range=1., depths=[6, 6, 6, 6], embed_dim=60, num_heads=[6, 6, 6, 6],
            mlp_ratio=2, upsampler='pixelshuffledirect', resi_connection='1conv')

SwinIR:

from models.network_swinir import SwinIR
model = SwinIR(upscale=2, img_size=64, window_size=8, img_range=1., depths=[6, 6, 6, 6],
               embed_dim=60, num_heads=[6, 6, 6, 6], mlp_ratio=2, upsampler='pixelshuffledirect')

The code comparison shows that both repositories use similar model initialization, with KAIR offering slightly more customization options in the constructor.

1,189

Designing a Practical Degradation Model for Deep Blind Image Super-Resolution (ICCV, 2021) (PyTorch) - We released the training code!

Pros of BSRGAN

  • Utilizes a novel Bilateral Space Restoration GAN architecture
  • Achieves high-quality image super-resolution with realistic textures
  • Provides pre-trained models for various upscaling factors

Cons of BSRGAN

  • May require more computational resources due to GAN architecture
  • Limited flexibility compared to SwinIR's transformer-based approach
  • Potentially slower inference time for real-time applications

Code Comparison

BSRGAN:

from models.network_bsrgan import BSRNet
model = BSRNet(in_nc=3, out_nc=3, nf=64, num_blocks=23, scale=4)

SwinIR:

from models.network_swinir import SwinIR
model = SwinIR(upscale=4, in_chans=3, img_size=64, window_size=8,
               img_range=1., depths=[6, 6, 6, 6], embed_dim=60, num_heads=[6, 6, 6, 6],
               mlp_ratio=2, upsampler='pixelshuffledirect')

Both repositories focus on image super-resolution, but BSRGAN employs a GAN-based approach, while SwinIR uses a transformer-based architecture. BSRGAN may produce more realistic textures, while SwinIR offers greater flexibility and potentially faster inference. The code snippets demonstrate the different model initialization approaches, with BSRGAN using a simpler structure and SwinIR requiring more parameters for its transformer architecture.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

SwinIR: Image Restoration Using Swin Transformer

Jingyun Liang, Jiezhang Cao, Guolei Sun, Kai Zhang, Luc Van Gool, Radu Timofte

Computer Vision Lab, ETH Zurich


arXiv GitHub Stars download visitors google colab logo PlayTorch Demo Gradio Web Demo

This repository is the official PyTorch implementation of SwinIR: Image Restoration Using Shifted Window Transformer (arxiv, supp, pretrained models, visual results). SwinIR achieves state-of-the-art performance in

  • bicubic/lighweight/real-world image SR
  • grayscale/color image denoising
  • grayscale/color JPEG compression artifact reduction

:rocket: :rocket: :rocket: News:

  • Aug. 16, 2022: Add PlayTorch Demo on running the real-world image SR model on mobile devices PlayTorch Demo.
  • Aug. 01, 2022: Add pretrained models and results on JPEG compression artifact reduction for color images.
  • Jun. 10, 2022: See our work on video restoration :fire::fire::fire: VRT: A Video Restoration Transformer GitHub Stars download and RVRT: Recurrent Video Restoration Transformer GitHub Stars download for video SR, video deblurring, video denoising, video frame interpolation and space-time video SR.
  • Sep. 07, 2021: We provide an interactive online Colab demo for real-world image SR google colab logo:fire: for comparison with the first practical degradation model BSRGAN (ICCV2021) GitHub Stars and a recent model RealESRGAN. Try to super-resolve your own images on Colab!
Real-World Image (x4)BSRGAN, ICCV2021Real-ESRGANSwinIR (ours)SwinIR-Large (ours)

Image restoration is a long-standing low-level vision problem that aims to restore high-quality images from low-quality images (e.g., downscaled, noisy and compressed images). While state-of-the-art image restoration methods are based on convolutional neural networks, few attempts have been made with Transformers which show impressive performance on high-level vision tasks. In this paper, we propose a strong baseline model SwinIR for image restoration based on the Swin Transformer. SwinIR consists of three parts: shallow feature extraction, deep feature extraction and high-quality image reconstruction. In particular, the deep feature extraction module is composed of several residual Swin Transformer blocks (RSTB), each of which has several Swin Transformer layers together with a residual connection. We conduct experiments on three representative tasks: image super-resolution (including classical, lightweight and real-world image super-resolution), image denoising (including grayscale and color image denoising) and JPEG compression artifact reduction. Experimental results demonstrate that SwinIR outperforms state-of-the-art methods on different tasks by up to 0.14~0.45dB, while the total number of parameters can be reduced by up to 67%.

Contents

  1. Training
  2. Testing
  3. Results
  4. Citation
  5. License and Acknowledgement

Training

Used training and testing sets can be downloaded as follows:

TaskTraining SetTesting SetVisual Results
classical/lightweight image SRDIV2K (800 training images) or DIV2K +Flickr2K (2650 images)Set5 + Set14 + BSD100 + Urban100 + Manga109 download allhere
real-world image SRSwinIR-M (middle size): DIV2K (800 training images) +Flickr2K (2650 images) + OST (alternative link, 10324 images for sky,water,grass,mountain,building,plant,animal)
SwinIR-L (large size): DIV2K + Flickr2K + OST + WED(4744 images) + FFHQ (first 2000 images, face) + Manga109 (manga) + SCUT-CTW1500 (first 100 training images, texts)

*We use the pionnerring practical degradation model from BSRGAN, ICCV2021 GitHub Stars
RealSRSet+5imageshere
color/grayscale image denoisingDIV2K (800 training images) + Flickr2K (2650 images) + BSD500 (400 training&testing images) + WED(4744 images)

*BSD68/BSD100 images are not used in training.
grayscale: Set12 + BSD68 + Urban100
color: CBSD68 + Kodak24 + McMaster + Urban100 download all
here
grayscale/color JPEG compression artifact reductionDIV2K (800 training images) + Flickr2K (2650 images) + BSD500 (400 training&testing images) + WED(4744 images)grayscale: Classic5 +LIVE1 download allhere

The training code is at KAIR.

Testing (without preparing datasets)

For your convience, we provide some example datasets (~20Mb) in /testsets. If you just want codes, downloading models/network_swinir.py, utils/util_calculate_psnr_ssim.py and main_test_swinir.py is enough. Following commands will download pretrained models automatically and put them in model_zoo/swinir. All visual results of SwinIR can be downloaded here.

We also provide an online Colab demo for real-world image SR google colab logo for comparison with the first practical degradation model BSRGAN (ICCV2021) GitHub Stars and a recent model RealESRGAN. Try to test your own images on Colab!

We provide a PlayTorch demo PlayTorch Demo for real-world image SR to showcase how to run the SwinIR model in mobile application built with React Native.

# 001 Classical Image Super-Resolution (middle size)
# Note that --training_patch_size is just used to differentiate two different settings in Table 2 of the paper. Images are NOT tested patch by patch.
# (setting1: when model is trained on DIV2K and with training_patch_size=48)
python main_test_swinir.py --task classical_sr --scale 2 --training_patch_size 48 --model_path model_zoo/swinir/001_classicalSR_DIV2K_s48w8_SwinIR-M_x2.pth --folder_lq testsets/Set5/LR_bicubic/X2 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 3 --training_patch_size 48 --model_path model_zoo/swinir/001_classicalSR_DIV2K_s48w8_SwinIR-M_x3.pth --folder_lq testsets/Set5/LR_bicubic/X3 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 4 --training_patch_size 48 --model_path model_zoo/swinir/001_classicalSR_DIV2K_s48w8_SwinIR-M_x4.pth --folder_lq testsets/Set5/LR_bicubic/X4 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 8 --training_patch_size 48 --model_path model_zoo/swinir/001_classicalSR_DIV2K_s48w8_SwinIR-M_x8.pth --folder_lq testsets/Set5/LR_bicubic/X8 --folder_gt testsets/Set5/HR

# (setting2: when model is trained on DIV2K+Flickr2K and with training_patch_size=64)
python main_test_swinir.py --task classical_sr --scale 2 --training_patch_size 64 --model_path model_zoo/swinir/001_classicalSR_DF2K_s64w8_SwinIR-M_x2.pth --folder_lq testsets/Set5/LR_bicubic/X2 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 3 --training_patch_size 64 --model_path model_zoo/swinir/001_classicalSR_DF2K_s64w8_SwinIR-M_x3.pth --folder_lq testsets/Set5/LR_bicubic/X3 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 4 --training_patch_size 64 --model_path model_zoo/swinir/001_classicalSR_DF2K_s64w8_SwinIR-M_x4.pth --folder_lq testsets/Set5/LR_bicubic/X4 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task classical_sr --scale 8 --training_patch_size 64 --model_path model_zoo/swinir/001_classicalSR_DF2K_s64w8_SwinIR-M_x8.pth --folder_lq testsets/Set5/LR_bicubic/X8 --folder_gt testsets/Set5/HR


# 002 Lightweight Image Super-Resolution (small size)
python main_test_swinir.py --task lightweight_sr --scale 2 --model_path model_zoo/swinir/002_lightweightSR_DIV2K_s64w8_SwinIR-S_x2.pth --folder_lq testsets/Set5/LR_bicubic/X2 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task lightweight_sr --scale 3 --model_path model_zoo/swinir/002_lightweightSR_DIV2K_s64w8_SwinIR-S_x3.pth --folder_lq testsets/Set5/LR_bicubic/X3 --folder_gt testsets/Set5/HR
python main_test_swinir.py --task lightweight_sr --scale 4 --model_path model_zoo/swinir/002_lightweightSR_DIV2K_s64w8_SwinIR-S_x4.pth --folder_lq testsets/Set5/LR_bicubic/X4 --folder_gt testsets/Set5/HR


# 003 Real-World Image Super-Resolution (use --tile 400 if you run out-of-memory)
# (middle size)
python main_test_swinir.py --task real_sr --scale 4 --model_path model_zoo/swinir/003_realSR_BSRGAN_DFO_s64w8_SwinIR-M_x4_GAN.pth --folder_lq testsets/RealSRSet+5images --tile

# (larger size + trained on more datasets)
python main_test_swinir.py --task real_sr --scale 4 --large_model --model_path model_zoo/swinir/003_realSR_BSRGAN_DFOWMFC_s64w8_SwinIR-L_x4_GAN.pth --folder_lq testsets/RealSRSet+5images


# 004 Grayscale Image Deoising (middle size)
python main_test_swinir.py --task gray_dn --noise 15 --model_path model_zoo/swinir/004_grayDN_DFWB_s128w8_SwinIR-M_noise15.pth --folder_gt testsets/Set12
python main_test_swinir.py --task gray_dn --noise 25 --model_path model_zoo/swinir/004_grayDN_DFWB_s128w8_SwinIR-M_noise25.pth --folder_gt testsets/Set12
python main_test_swinir.py --task gray_dn --noise 50 --model_path model_zoo/swinir/004_grayDN_DFWB_s128w8_SwinIR-M_noise50.pth --folder_gt testsets/Set12


# 005 Color Image Deoising (middle size)
python main_test_swinir.py --task color_dn --noise 15 --model_path model_zoo/swinir/005_colorDN_DFWB_s128w8_SwinIR-M_noise15.pth --folder_gt testsets/McMaster
python main_test_swinir.py --task color_dn --noise 25 --model_path model_zoo/swinir/005_colorDN_DFWB_s128w8_SwinIR-M_noise25.pth --folder_gt testsets/McMaster
python main_test_swinir.py --task color_dn --noise 50 --model_path model_zoo/swinir/005_colorDN_DFWB_s128w8_SwinIR-M_noise50.pth --folder_gt testsets/McMaster


# 006 JPEG Compression Artifact Reduction (middle size, using window_size=7 because JPEG encoding uses 8x8 blocks)
# grayscale
python main_test_swinir.py --task jpeg_car --jpeg 10 --model_path model_zoo/swinir/006_CAR_DFWB_s126w7_SwinIR-M_jpeg10.pth --folder_gt testsets/classic5
python main_test_swinir.py --task jpeg_car --jpeg 20 --model_path model_zoo/swinir/006_CAR_DFWB_s126w7_SwinIR-M_jpeg20.pth --folder_gt testsets/classic5
python main_test_swinir.py --task jpeg_car --jpeg 30 --model_path model_zoo/swinir/006_CAR_DFWB_s126w7_SwinIR-M_jpeg30.pth --folder_gt testsets/classic5
python main_test_swinir.py --task jpeg_car --jpeg 40 --model_path model_zoo/swinir/006_CAR_DFWB_s126w7_SwinIR-M_jpeg40.pth --folder_gt testsets/classic5

# color
python main_test_swinir.py --task color_jpeg_car --jpeg 10 --model_path model_zoo/swinir/006_colorCAR_DFWB_s126w7_SwinIR-M_jpeg10.pth --folder_gt testsets/LIVE1
python main_test_swinir.py --task color_jpeg_car --jpeg 20 --model_path model_zoo/swinir/006_colorCAR_DFWB_s126w7_SwinIR-M_jpeg20.pth --folder_gt testsets/LIVE1
python main_test_swinir.py --task color_jpeg_car --jpeg 30 --model_path model_zoo/swinir/006_colorCAR_DFWB_s126w7_SwinIR-M_jpeg30.pth --folder_gt testsets/LIVE1
python main_test_swinir.py --task color_jpeg_car --jpeg 40 --model_path model_zoo/swinir/006_colorCAR_DFWB_s126w7_SwinIR-M_jpeg40.pth --folder_gt testsets/LIVE1


Results

We achieved state-of-the-art performance on classical/lightweight/real-world image SR, grayscale/color image denoising and JPEG compression artifact reduction. Detailed results can be found in the paper. All visual results of SwinIR can be downloaded here.

Classical Image Super-Resolution (click me)

  • More detailed comparison between SwinIR and a representative CNN-based model RCAN (classical image SR, X4)
MethodTraining SetTraining time
(8GeForceRTX2080Ti
batch=32, iter=500k)
Y-PSNR/Y-SSIM
on Manga109
Run time
(1GeForceRTX2080Ti,
on 256x256 LR image)*
#Params#FLOPsTesting memory
RCANDIV2K1.6 days31.22/0.91730.180s15.6M850.6G593.1M
SwinIRDIV2K1.8 days31.67/0.92260.539s11.9M788.6G986.8M

* We re-test the runtime when the GPU is idle. We refer to the evluation code here.

  • Results on DIV2K-validation (100 images)
Training Setscale factorPSNR (RGB)PSNR (Y)SSIM (RGB)SSIM (Y)
DIV2K (800 images)235.2536.770.94230.9500
DIV2K+Flickr2K (2650 images)235.3436.860.94300.9507
DIV2K (800 images)331.5032.970.88320.8965
DIV2K+Flickr2K (2650 images)331.6333.100.88540.8985
DIV2K (800 images)429.4830.940.83110.8492
DIV2K+Flickr2K (2650 images)429.6331.080.83470.8523
Lightweight Image Super-Resolution

Real-World Image Super-Resolution

Grayscale Image Deoising

Color Image Deoising

JPEG Compression Artifact Reduction

on grayscale images

on color images

Training Setquality factorPSNR (RGB)PSNR-B (RGB)SSIM (RGB)
LIVE11028.0627.760.8089
LIVE12030.4529.970.8741
LIVE13031.8231.240.9018
LIVE14032.7532.120.9174

Citation

@article{liang2021swinir,
  title={SwinIR: Image Restoration Using Swin Transformer},
  author={Liang, Jingyun and Cao, Jiezhang and Sun, Guolei and Zhang, Kai and Van Gool, Luc and Timofte, Radu},
  journal={arXiv preprint arXiv:2108.10257},
  year={2021}
}

License and Acknowledgement

This project is released under the Apache 2.0 license. The codes are based on Swin Transformer and KAIR. Please also follow their licenses. Thanks for their awesome works.