KAIR

Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR

3,206

663

3,206

View on GitHub

Top Related Projects

Real-ESRGAN

31,984

Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.

SwinIR

4,946

SwinIR: Image Restoration Using Swin Transformer (official repository)

ESRGAN

6,397

ECCV18 Workshops - Enhanced SRGAN. Champion PIRM Challenge on Perceptual Super-Resolution. The training codes are in BasicSR.

super-resolution

1,512

Tensorflow 2.x based implementation of EDSR, WDSR and SRGAN for single image super-resolution

Quick Overview

KAIR (Keras AI Research) is a comprehensive toolkit for image restoration and enhancement tasks using deep learning. It provides implementations of various state-of-the-art models for tasks such as denoising, super-resolution, and deblurring, along with training and testing frameworks.

Pros

Extensive collection of pre-trained models for various image restoration tasks
Well-organized codebase with modular architecture for easy customization
Supports both PyTorch and TensorFlow/Keras implementations
Includes data preparation scripts and utility functions for dataset handling

Cons

Limited documentation for some advanced features and customizations
Requires significant computational resources for training large models
Some older models may not be actively maintained or updated
Steep learning curve for users new to deep learning in image processing

Code Examples

Loading a pre-trained denoising model:

from models.network_unet import UNetRes
from utils import utils_image as util

model = UNetRes(in_nc=3, out_nc=3, nc=[64, 128, 256, 512], nb=4, act_mode='R', downsample_mode="strideconv", upsample_mode="convtranspose")
model.load_state_dict(torch.load('model_zoo/dncnn3.pth'), strict=True)
model.eval()

Performing image denoising:

import torch

noisy_img = util.imread_uint('noisy_image.png', n_channels=3)
noisy_img = util.uint2tensor4(noisy_img)

with torch.no_grad():
    denoised_img = model(noisy_img)

denoised_img = util.tensor2uint(denoised_img)
util.imsave(denoised_img, 'denoised_image.png')

Training a super-resolution model:

from models.network_srresnet import SRResNet
from data.dataset_sr import DatasetSR
from torch.utils.data import DataLoader

model = SRResNet(in_nc=3, out_nc=3, nc=64, nb=16, upscale=4)
train_set = DatasetSR('path/to/training/data', patch_size=96, scale=4)
train_loader = DataLoader(train_set, batch_size=16, shuffle=True)

optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
criterion = torch.nn.L1Loss()

for epoch in range(100):
    for data in train_loader:
        lr, hr = data['L'], data['H']
        optimizer.zero_grad()
        sr = model(lr)
        loss = criterion(sr, hr)
        loss.backward()
        optimizer.step()

Getting Started

Clone the repository:

git clone https://github.com/cszn/KAIR.git
cd KAIR

Install dependencies:
```
pip install -r requirements.txt
```

Download pre-trained models:

python main_download_pretrained_models.py

Run a demo:
```
python main_test_dncdn.py
```

Competitor Comparisons

Real-ESRGAN

31,984

Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.

Pros of Real-ESRGAN

Focuses specifically on real-world image super-resolution
Implements a more advanced degradation model for training
Provides pre-trained models for immediate use

Cons of Real-ESRGAN

Limited to super-resolution tasks
Less comprehensive in terms of image restoration techniques
Fewer options for customization and experimentation

Code Comparison

Real-ESRGAN:

from basicsr.archs.rrdbnet_arch import RRDBNet
from realesrgan import RealESRGANer

model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32)
upsampler = RealESRGANer(model_path='weights/RealESRGAN_x4plus.pth', model=model, scale=4)

KAIR:

from models.network_unet import UNetRes
from utils import utils_image as util

model = UNetRes(in_nc=1, out_nc=1, nc=[64, 128, 256, 512], nb=4, act_mode='R', downsample_mode="strideconv", upsample_mode="convtranspose")
img_L = util.imread_uint('input.png', n_channels=1)
img_E = model(img_L)

Summary

Real-ESRGAN excels in real-world super-resolution tasks with ready-to-use models, while KAIR offers a broader range of image restoration techniques and greater flexibility for researchers. Real-ESRGAN is more user-friendly for specific super-resolution applications, whereas KAIR provides a comprehensive toolkit for various image processing tasks and experimentation.

SwinIR

4,946

SwinIR: Image Restoration Using Swin Transformer (official repository)

Pros of SwinIR

Utilizes the Swin Transformer architecture, which can capture long-range dependencies more effectively
Achieves state-of-the-art performance on various image restoration tasks
Provides pre-trained models for different applications (e.g., image denoising, super-resolution)

Cons of SwinIR

More complex architecture, potentially requiring more computational resources
Limited to specific image restoration tasks compared to KAIR's broader scope
May have a steeper learning curve for implementation and customization

Code Comparison

SwinIR:

from models.network_swinir import SwinIR
model = SwinIR(upscale=4, in_chans=3, img_size=64, window_size=8,
               img_range=1., depths=[6, 6, 6, 6], embed_dim=60, num_heads=[6, 6, 6, 6],
               mlp_ratio=2, upsampler='pixelshuffledirect', resi_connection='1conv')

KAIR:

from models.network_unet import UNetRes
model = UNetRes(in_nc=3, out_nc=3, nc=[64, 128, 256, 512], nb=4, act_mode='R',
                downsample_mode='strideconv', upsample_mode='convtranspose')

Both repositories offer powerful image restoration solutions, with SwinIR focusing on transformer-based architectures and KAIR providing a more diverse set of traditional and deep learning models.

ESRGAN

6,397

ECCV18 Workshops - Enhanced SRGAN. Champion PIRM Challenge on Perceptual Super-Resolution. The training codes are in BasicSR.

Pros of ESRGAN

Focused specifically on super-resolution tasks, providing a more specialized solution
Includes pre-trained models for quick implementation and testing
Offers a perceptual loss function for improved visual quality

Cons of ESRGAN

Limited to super-resolution tasks, while KAIR supports multiple image restoration tasks
Less active development and updates compared to KAIR
Fewer options for customization and experimentation

Code Comparison

ESRGAN:

from models.archs.arch_util import initialize_weights
from models.archs.rrdb_net import RRDBNet

model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32)
initialize_weights(model, scale=0.1)

KAIR:

from models.network_unet import UNetRes as net
from utils import utils_model

model = net(in_nc=3, out_nc=3, nc=[64, 128, 256, 512], nb=4, act_mode='R', downsample_mode='strideconv', upsample_mode='convtranspose')
utils_model.init_weights(model, init_type='orthogonal')

Both repositories offer implementations of deep learning models for image processing tasks. ESRGAN is more specialized for super-resolution, while KAIR provides a broader range of image restoration capabilities. ESRGAN may be easier to use for specific super-resolution tasks, but KAIR offers more flexibility and ongoing development for various image processing applications.

super-resolution

1,512

Tensorflow 2.x based implementation of EDSR, WDSR and SRGAN for single image super-resolution

Pros of super-resolution

Focuses specifically on super-resolution tasks, making it more specialized and potentially easier to use for this specific application
Implements multiple state-of-the-art super-resolution models, providing a variety of options for users
Includes pre-trained models, allowing for quick implementation and testing

Cons of super-resolution

Less comprehensive than KAIR, which covers a broader range of image restoration tasks
May have fewer active contributors and updates compared to KAIR
Documentation might be less extensive, potentially making it harder for new users to get started

Code Comparison

KAIR example:

from models.network_unet import UNetRes as net
model = net(in_nc=3, out_nc=3, nc=[64, 128, 256, 512], nb=4, act_mode='R', downsample_mode='strideconv', upsample_mode='convtranspose')

super-resolution example:

from model import resolve_single
from model.srgan import generator
model = generator()
sr_image = resolve_single(model, lr_image)

Both repositories provide implementations of image enhancement models, but KAIR offers a more comprehensive toolkit for various image restoration tasks, while super-resolution focuses specifically on super-resolution techniques.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Training and testing codes for USRNet, DnCNN, FFDNet, SRMD, DPSR, MSRResNet, ESRGAN, BSRGAN, SwinIR, VRT, RVRT

Kai Zhang

Computer Vision Lab, ETH Zurich, Switzerland

News (2023-06-02): Code for "Denoising Diffusion Models for Plug-and-Play Image Restoration" is released at yuanzhi-zhu/DiffPIR.
News (2022-10-04): We release the training codes of RVRT, NeurlPS2022 for video SR, deblurring and denoising.
News (2022-05-05): Try the online demo of SCUNet for blind real image denoising.
News (2022-03-23): We release the testing codes of SCUNet for blind real image denoising.

The following results are obtained by our SCUNet with purely synthetic training data! We did not use the paired noisy/clean data by DND and SIDD during training!

News (2022-02-15): We release the training codes of VRT for video SR, deblurring and denoising.
News (2021-12-23): Our techniques are adopted in https://www.amemori.ai/.
News (2021-12-23): Our new work for practical image denoising.
News (2021-09-09): Add main_download_pretrained_models.py to download pre-trained models.
News (2021-09-08): Add matlab code to zoom local part of an image for the purpose of comparison between different results.
News (2021-09-07): We upload the training code of SwinIR and provide an interactive online Colob demo for real-world image SR. Try to super-resolve your own images on Colab!

Real-World Image (x4)	BSRGAN, ICCV2021	Real-ESRGAN	SwinIR (ours)

News (2021-08-31): We upload the training code of BSRGAN.
News (2021-08-24): We upload the BSRGAN degradation model.
News (2021-08-22): Support multi-feature-layer VGG perceptual loss and UNet discriminator.
News (2021-08-18): We upload the extended BSRGAN degradation model. It is slightly different from our published version.
News (2021-06-03): Add testing codes of GPEN (CVPR21) for face image enhancement: main_test_face_enhancement.py

News (2021-05-13): Add PatchGAN discriminator.
News (2021-05-12): Support distributed training, see also https://github.com/xinntao/BasicSR/blob/master/docs/TrainTest.md.
News (2021-01): BSRGAN for blind real image super-resolution will be added.
Pull requests are welcome!
Correction (2020-10): If you use multiple GPUs for GAN training, remove or comment Line 105 to enable DataParallel for fast training
News (2020-10): Add utils_receptivefield.py to calculate receptive field.
News (2020-8): A deep plug-and-play image restoration toolbox is released at cszn/DPIR.
Tips (2020-8): Use this to avoid out of memory issue.
News (2020-7): Add main_challenge_sr.py to get FLOPs, #Params, Runtime, #Activations, #Conv, and Max Memory Allocated.

from utils.utils_modelsummary import get_model_activation, get_model_flops
input_dim = (3, 256, 256)  # set the input dimension
activations, num_conv2d = get_model_activation(model, input_dim)
logger.info('{:>16s} : {:<.4f} [M]'.format('#Activations', activations/10**6))
logger.info('{:>16s} : {:<d}'.format('#Conv2d', num_conv2d))
flops = get_model_flops(model, input_dim, False)
logger.info('{:>16s} : {:<.4f} [G]'.format('FLOPs', flops/10**9))
num_parameters = sum(map(lambda x: x.numel(), model.parameters()))
logger.info('{:>16s} : {:<.4f} [M]'.format('#Params', num_parameters/10**6))

News (2020-6): Add USRNet (CVPR 2020) for training and testing.
- Network Architecture
- Dataset

Clone repo

git clone https://github.com/cszn/KAIR.git

pip install -r requirement.txt

Training

You should modify the json file from options first, for example, setting "gpu_ids": [0,1,2,3] if 4 GPUs are used, setting "dataroot_H": "trainsets/trainH" if path of the high quality dataset is trainsets/trainH.

Training with DataParallel - PSNR

python main_train_psnr.py --opt options/train_msrresnet_psnr.json

Training with DataParallel - GAN

python main_train_gan.py --opt options/train_msrresnet_gan.json

Training with DistributedDataParallel - PSNR - 4 GPUs

python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 main_train_psnr.py --opt options/train_msrresnet_psnr.json  --dist True

Training with DistributedDataParallel - PSNR - 8 GPUs

python -m torch.distributed.launch --nproc_per_node=8 --master_port=1234 main_train_psnr.py --opt options/train_msrresnet_psnr.json  --dist True

Training with DistributedDataParallel - GAN - 4 GPUs

python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 main_train_gan.py --opt options/train_msrresnet_gan.json  --dist True

Training with DistributedDataParallel - GAN - 8 GPUs

python -m torch.distributed.launch --nproc_per_node=8 --master_port=1234 main_train_gan.py --opt options/train_msrresnet_gan.json  --dist True

Kill distributed training processes of main_train_gan.py

kill $(ps aux | grep main_train_gan.py | grep -v grep | awk '{print $2}')

Method	Original Link
DnCNN	https://github.com/cszn/DnCNN
FDnCNN	https://github.com/cszn/DnCNN
FFDNet	https://github.com/cszn/FFDNet
SRMD	https://github.com/cszn/SRMD
DPSR-SRResNet	https://github.com/cszn/DPSR
SRResNet	https://github.com/xinntao/BasicSR
ESRGAN	https://github.com/xinntao/ESRGAN
RRDB	https://github.com/xinntao/ESRGAN
IMDB	https://github.com/Zheng222/IMDN
USRNet	https://github.com/cszn/USRNet
DRUNet	https://github.com/cszn/DPIR
DPIR	https://github.com/cszn/DPIR
BSRGAN	https://github.com/cszn/BSRGAN
SwinIR	https://github.com/JingyunLiang/SwinIR
VRT	https://github.com/JingyunLiang/VRT
DiffPIR	https://github.com/yuanzhi-zhu/DiffPIR

Network architectures

USRNet
DnCNN
IRCNN denoiser

FFDNet
SRMD
SRResNet, SRGAN, RRDB, ESRGAN
IMDN

-----

Testing

Method	model_zoo
main_test_dncnn.py	`dncnn_15.pth, dncnn_25.pth, dncnn_50.pth, dncnn_gray_blind.pth, dncnn_color_blind.pth, dncnn3.pth`
main_test_ircnn_denoiser.py	`ircnn_gray.pth, ircnn_color.pth`
main_test_fdncnn.py	`fdncnn_gray.pth, fdncnn_color.pth, fdncnn_gray_clip.pth, fdncnn_color_clip.pth`
main_test_ffdnet.py	`ffdnet_gray.pth, ffdnet_color.pth, ffdnet_gray_clip.pth, ffdnet_color_clip.pth`
main_test_srmd.py	`srmdnf_x2.pth, srmdnf_x3.pth, srmdnf_x4.pth, srmd_x2.pth, srmd_x3.pth, srmd_x4.pth`
	The above models are converted from MatConvNet.
main_test_dpsr.py	`dpsr_x2.pth, dpsr_x3.pth, dpsr_x4.pth, dpsr_x4_gan.pth`
main_test_msrresnet.py	`msrresnet_x4_psnr.pth, msrresnet_x4_gan.pth`
main_test_rrdb.py	`rrdb_x4_psnr.pth, rrdb_x4_esrgan.pth`
main_test_imdn.py	`imdn_x4.pth`

model_zoo

download link https://drive.google.com/drive/folders/13kfr3qny7S2xwG9h7v95F5mkWs0OmU0D

trainsets

https://github.com/xinntao/BasicSR/blob/master/docs/DatasetPreparation.md
train400
DIV2K
Flickr2K
optional: use split_imageset(original_dataroot, taget_dataroot, n_channels=3, p_size=512, p_overlap=96, p_max=800) to get trainsets/trainH with small images for fast data loading

testsets

https://github.com/xinntao/BasicSR/blob/master/docs/DatasetPreparation.md
set12
bsd68
cbsd68
kodak24
srbsd68
set5
set14
cbsd100
urban100
manga109

References

@inproceedings{zhu2023denoising, % DiffPIR
title={Denoising Diffusion Models for Plug-and-Play Image Restoration},
author={Yuanzhi Zhu and Kai Zhang and Jingyun Liang and Jiezhang Cao and Bihan Wen and Radu Timofte and Luc Van Gool},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition Workshops},
year={2023}
}
@article{liang2022vrt,
title={VRT: A Video Restoration Transformer},
author={Liang, Jingyun and Cao, Jiezhang and Fan, Yuchen and Zhang, Kai and Ranjan, Rakesh and Li, Yawei and Timofte, Radu and Van Gool, Luc},
journal={arXiv preprint arXiv:2022.00000},
year={2022}
}
@inproceedings{liang2021swinir,
title={SwinIR: Image Restoration Using Swin Transformer},
author={Liang, Jingyun and Cao, Jiezhang and Sun, Guolei and Zhang, Kai and Van Gool, Luc and Timofte, Radu},
booktitle={IEEE International Conference on Computer Vision Workshops},
pages={1833--1844},
year={2021}
}
@inproceedings{zhang2021designing,
title={Designing a Practical Degradation Model for Deep Blind Image Super-Resolution},
author={Zhang, Kai and Liang, Jingyun and Van Gool, Luc and Timofte, Radu},
booktitle={IEEE International Conference on Computer Vision},
pages={4791--4800},
year={2021}
}
@article{zhang2021plug, % DPIR & DRUNet & IRCNN
  title={Plug-and-Play Image Restoration with Deep Denoiser Prior},
  author={Zhang, Kai and Li, Yawei and Zuo, Wangmeng and Zhang, Lei and Van Gool, Luc and Timofte, Radu},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
  year={2021}
}
@inproceedings{zhang2020aim, % efficientSR_challenge
  title={AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results},
  author={Kai Zhang and Martin Danelljan and Yawei Li and Radu Timofte and others},
  booktitle={European Conference on Computer Vision Workshops},
  year={2020}
}
@inproceedings{zhang2020deep, % USRNet
  title={Deep unfolding network for image super-resolution},
  author={Zhang, Kai and Van Gool, Luc and Timofte, Radu},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  pages={3217--3226},
  year={2020}
}
@article{zhang2017beyond, % DnCNN
  title={Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising},
  author={Zhang, Kai and Zuo, Wangmeng and Chen, Yunjin and Meng, Deyu and Zhang, Lei},
  journal={IEEE Transactions on Image Processing},
  volume={26},
  number={7},
  pages={3142--3155},
  year={2017}
}
@inproceedings{zhang2017learning, % IRCNN
title={Learning deep CNN denoiser prior for image restoration},
author={Zhang, Kai and Zuo, Wangmeng and Gu, Shuhang and Zhang, Lei},
booktitle={IEEE conference on computer vision and pattern recognition},
pages={3929--3938},
year={2017}
}
@article{zhang2018ffdnet, % FFDNet, FDnCNN
  title={FFDNet: Toward a fast and flexible solution for CNN-based image denoising},
  author={Zhang, Kai and Zuo, Wangmeng and Zhang, Lei},
  journal={IEEE Transactions on Image Processing},
  volume={27},
  number={9},
  pages={4608--4622},
  year={2018}
}
@inproceedings{zhang2018learning, % SRMD
  title={Learning a single convolutional super-resolution network for multiple degradations},
  author={Zhang, Kai and Zuo, Wangmeng and Zhang, Lei},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  pages={3262--3271},
  year={2018}
}
@inproceedings{zhang2019deep, % DPSR
  title={Deep Plug-and-Play Super-Resolution for Arbitrary Blur Kernels},
  author={Zhang, Kai and Zuo, Wangmeng and Zhang, Lei},
  booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
  pages={1671--1681},
  year={2019}
}
@InProceedings{wang2018esrgan, % ESRGAN, MSRResNet
    author = {Wang, Xintao and Yu, Ke and Wu, Shixiang and Gu, Jinjin and Liu, Yihao and Dong, Chao and Qiao, Yu and Loy, Chen Change},
    title = {ESRGAN: Enhanced super-resolution generative adversarial networks},
    booktitle = {The European Conference on Computer Vision Workshops (ECCVW)},
    month = {September},
    year = {2018}
}
@inproceedings{hui2019lightweight, % IMDN
  title={Lightweight Image Super-Resolution with Information Multi-distillation Network},
  author={Hui, Zheng and Gao, Xinbo and Yang, Yunchu and Wang, Xiumei},
  booktitle={Proceedings of the 27th ACM International Conference on Multimedia (ACM MM)},
  pages={2024--2032},
  year={2019}
}
@inproceedings{zhang2019aim, % IMDN
  title={AIM 2019 Challenge on Constrained Super-Resolution: Methods and Results},
  author={Kai Zhang and Shuhang Gu and Radu Timofte and others},
  booktitle={IEEE International Conference on Computer Vision Workshops},
  year={2019}
}
@inproceedings{yang2021gan,
    title={GAN Prior Embedded Network for Blind Face Restoration in the Wild},
    author={Tao Yang, Peiran Ren, Xuansong Xie, and Lei Zhang},
    booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
    year={2021}
}

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot