KAIR
Image Restoration Toolbox (PyTorch). Training and testing codes for DPIR, USRNet, DnCNN, FFDNet, SRMD, DPSR, BSRGAN, SwinIR
Top Related Projects
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
SwinIR: Image Restoration Using Swin Transformer (official repository)
ECCV18 Workshops - Enhanced SRGAN. Champion PIRM Challenge on Perceptual Super-Resolution. The training codes are in BasicSR.
Tensorflow 2.x based implementation of EDSR, WDSR and SRGAN for single image super-resolution
Quick Overview
KAIR (Keras AI Research) is a comprehensive toolkit for image restoration and enhancement tasks using deep learning. It provides implementations of various state-of-the-art models for tasks such as denoising, super-resolution, and deblurring, along with training and testing frameworks.
Pros
- Extensive collection of pre-trained models for various image restoration tasks
- Well-organized codebase with modular architecture for easy customization
- Supports both PyTorch and TensorFlow/Keras implementations
- Includes data preparation scripts and utility functions for dataset handling
Cons
- Limited documentation for some advanced features and customizations
- Requires significant computational resources for training large models
- Some older models may not be actively maintained or updated
- Steep learning curve for users new to deep learning in image processing
Code Examples
- Loading a pre-trained denoising model:
from models.network_unet import UNetRes
from utils import utils_image as util
model = UNetRes(in_nc=3, out_nc=3, nc=[64, 128, 256, 512], nb=4, act_mode='R', downsample_mode="strideconv", upsample_mode="convtranspose")
model.load_state_dict(torch.load('model_zoo/dncnn3.pth'), strict=True)
model.eval()
- Performing image denoising:
import torch
noisy_img = util.imread_uint('noisy_image.png', n_channels=3)
noisy_img = util.uint2tensor4(noisy_img)
with torch.no_grad():
denoised_img = model(noisy_img)
denoised_img = util.tensor2uint(denoised_img)
util.imsave(denoised_img, 'denoised_image.png')
- Training a super-resolution model:
from models.network_srresnet import SRResNet
from data.dataset_sr import DatasetSR
from torch.utils.data import DataLoader
model = SRResNet(in_nc=3, out_nc=3, nc=64, nb=16, upscale=4)
train_set = DatasetSR('path/to/training/data', patch_size=96, scale=4)
train_loader = DataLoader(train_set, batch_size=16, shuffle=True)
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
criterion = torch.nn.L1Loss()
for epoch in range(100):
for data in train_loader:
lr, hr = data['L'], data['H']
optimizer.zero_grad()
sr = model(lr)
loss = criterion(sr, hr)
loss.backward()
optimizer.step()
Getting Started
-
Clone the repository:
git clone https://github.com/cszn/KAIR.git cd KAIR
-
Install dependencies:
pip install -r requirements.txt
-
Download pre-trained models:
python main_download_pretrained_models.py
-
Run a demo:
python main_test_dncdn.py
Competitor Comparisons
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
Pros of Real-ESRGAN
- Focuses specifically on real-world image super-resolution
- Implements a more advanced degradation model for training
- Provides pre-trained models for immediate use
Cons of Real-ESRGAN
- Limited to super-resolution tasks
- Less comprehensive in terms of image restoration techniques
- Fewer options for customization and experimentation
Code Comparison
Real-ESRGAN:
from basicsr.archs.rrdbnet_arch import RRDBNet
from realesrgan import RealESRGANer
model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32)
upsampler = RealESRGANer(model_path='weights/RealESRGAN_x4plus.pth', model=model, scale=4)
KAIR:
from models.network_unet import UNetRes
from utils import utils_image as util
model = UNetRes(in_nc=1, out_nc=1, nc=[64, 128, 256, 512], nb=4, act_mode='R', downsample_mode="strideconv", upsample_mode="convtranspose")
img_L = util.imread_uint('input.png', n_channels=1)
img_E = model(img_L)
Summary
Real-ESRGAN excels in real-world super-resolution tasks with ready-to-use models, while KAIR offers a broader range of image restoration techniques and greater flexibility for researchers. Real-ESRGAN is more user-friendly for specific super-resolution applications, whereas KAIR provides a comprehensive toolkit for various image processing tasks and experimentation.
SwinIR: Image Restoration Using Swin Transformer (official repository)
Pros of SwinIR
- Utilizes the Swin Transformer architecture, which can capture long-range dependencies more effectively
- Achieves state-of-the-art performance on various image restoration tasks
- Provides pre-trained models for different applications (e.g., image denoising, super-resolution)
Cons of SwinIR
- More complex architecture, potentially requiring more computational resources
- Limited to specific image restoration tasks compared to KAIR's broader scope
- May have a steeper learning curve for implementation and customization
Code Comparison
SwinIR:
from models.network_swinir import SwinIR
model = SwinIR(upscale=4, in_chans=3, img_size=64, window_size=8,
img_range=1., depths=[6, 6, 6, 6], embed_dim=60, num_heads=[6, 6, 6, 6],
mlp_ratio=2, upsampler='pixelshuffledirect', resi_connection='1conv')
KAIR:
from models.network_unet import UNetRes
model = UNetRes(in_nc=3, out_nc=3, nc=[64, 128, 256, 512], nb=4, act_mode='R',
downsample_mode='strideconv', upsample_mode='convtranspose')
Both repositories offer powerful image restoration solutions, with SwinIR focusing on transformer-based architectures and KAIR providing a more diverse set of traditional and deep learning models.
ECCV18 Workshops - Enhanced SRGAN. Champion PIRM Challenge on Perceptual Super-Resolution. The training codes are in BasicSR.
Pros of ESRGAN
- Focused specifically on super-resolution tasks, providing a more specialized solution
- Includes pre-trained models for quick implementation and testing
- Offers a perceptual loss function for improved visual quality
Cons of ESRGAN
- Limited to super-resolution tasks, while KAIR supports multiple image restoration tasks
- Less active development and updates compared to KAIR
- Fewer options for customization and experimentation
Code Comparison
ESRGAN:
from models.archs.arch_util import initialize_weights
from models.archs.rrdb_net import RRDBNet
model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32)
initialize_weights(model, scale=0.1)
KAIR:
from models.network_unet import UNetRes as net
from utils import utils_model
model = net(in_nc=3, out_nc=3, nc=[64, 128, 256, 512], nb=4, act_mode='R', downsample_mode='strideconv', upsample_mode='convtranspose')
utils_model.init_weights(model, init_type='orthogonal')
Both repositories offer implementations of deep learning models for image processing tasks. ESRGAN is more specialized for super-resolution, while KAIR provides a broader range of image restoration capabilities. ESRGAN may be easier to use for specific super-resolution tasks, but KAIR offers more flexibility and ongoing development for various image processing applications.
Tensorflow 2.x based implementation of EDSR, WDSR and SRGAN for single image super-resolution
Pros of super-resolution
- Focuses specifically on super-resolution tasks, making it more specialized and potentially easier to use for this specific application
- Implements multiple state-of-the-art super-resolution models, providing a variety of options for users
- Includes pre-trained models, allowing for quick implementation and testing
Cons of super-resolution
- Less comprehensive than KAIR, which covers a broader range of image restoration tasks
- May have fewer active contributors and updates compared to KAIR
- Documentation might be less extensive, potentially making it harder for new users to get started
Code Comparison
KAIR example:
from models.network_unet import UNetRes as net
model = net(in_nc=3, out_nc=3, nc=[64, 128, 256, 512], nb=4, act_mode='R', downsample_mode='strideconv', upsample_mode='convtranspose')
super-resolution example:
from model import resolve_single
from model.srgan import generator
model = generator()
sr_image = resolve_single(model, lr_image)
Both repositories provide implementations of image enhancement models, but KAIR offers a more comprehensive toolkit for various image restoration tasks, while super-resolution focuses specifically on super-resolution techniques.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
Training and testing codes for USRNet, DnCNN, FFDNet, SRMD, DPSR, MSRResNet, ESRGAN, BSRGAN, SwinIR, VRT, RVRT
Computer Vision Lab, ETH Zurich, Switzerland
-
News (2023-06-02): Code for "Denoising Diffusion Models for Plug-and-Play Image Restoration" is released at yuanzhi-zhu/DiffPIR.
-
News (2022-10-04): We release the training codes of RVRT, NeurlPS2022 for video SR, deblurring and denoising.
-
News (2022-05-05): Try the online demo of SCUNet for blind real image denoising.
-
News (2022-03-23): We release the testing codes of SCUNet for blind real image denoising.
The following results are obtained by our SCUNet with purely synthetic training data! We did not use the paired noisy/clean data by DND and SIDD during training!
-
News (2022-02-15): We release the training codes of VRT for video SR, deblurring and denoising.
-
News (2021-12-23): Our techniques are adopted in https://www.amemori.ai/.
-
News (2021-12-23): Our new work for practical image denoising.
-
-
News (2021-09-09): Add main_download_pretrained_models.py to download pre-trained models.
-
News (2021-09-08): Add matlab code to zoom local part of an image for the purpose of comparison between different results.
-
News (2021-09-07): We upload the training code of SwinIR and provide an interactive online Colob demo for real-world image SR. Try to super-resolve your own images on Colab!
Real-World Image (x4) | BSRGAN, ICCV2021 | Real-ESRGAN | SwinIR (ours) |
---|---|---|---|
-
News (2021-08-31): We upload the training code of BSRGAN.
-
News (2021-08-24): We upload the BSRGAN degradation model.
-
News (2021-08-22): Support multi-feature-layer VGG perceptual loss and UNet discriminator.
-
News (2021-08-18): We upload the extended BSRGAN degradation model. It is slightly different from our published version.
-
News (2021-06-03): Add testing codes of GPEN (CVPR21) for face image enhancement: main_test_face_enhancement.py
-
News (2021-05-13): Add PatchGAN discriminator.
-
News (2021-05-12): Support distributed training, see also https://github.com/xinntao/BasicSR/blob/master/docs/TrainTest.md.
-
News (2021-01): BSRGAN for blind real image super-resolution will be added.
-
Pull requests are welcome!
-
Correction (2020-10): If you use multiple GPUs for GAN training, remove or comment Line 105 to enable
DataParallel
for fast training -
News (2020-10): Add utils_receptivefield.py to calculate receptive field.
-
News (2020-8): A
deep plug-and-play image restoration toolbox
is released at cszn/DPIR. -
Tips (2020-8): Use this to avoid
out of memory
issue. -
News (2020-7): Add main_challenge_sr.py to get
FLOPs
,#Params
,Runtime
,#Activations
,#Conv
, andMax Memory Allocated
.
from utils.utils_modelsummary import get_model_activation, get_model_flops
input_dim = (3, 256, 256) # set the input dimension
activations, num_conv2d = get_model_activation(model, input_dim)
logger.info('{:>16s} : {:<.4f} [M]'.format('#Activations', activations/10**6))
logger.info('{:>16s} : {:<d}'.format('#Conv2d', num_conv2d))
flops = get_model_flops(model, input_dim, False)
logger.info('{:>16s} : {:<.4f} [G]'.format('FLOPs', flops/10**9))
num_parameters = sum(map(lambda x: x.numel(), model.parameters()))
logger.info('{:>16s} : {:<.4f} [M]'.format('#Params', num_parameters/10**6))
- News (2020-6): Add USRNet (CVPR 2020) for training and testing.
Clone repo
git clone https://github.com/cszn/KAIR.git
pip install -r requirement.txt
Training
You should modify the json file from options first, for example,
setting "gpu_ids": [0,1,2,3] if 4 GPUs are used,
setting "dataroot_H": "trainsets/trainH" if path of the high quality dataset is trainsets/trainH
.
- Training with
DataParallel
- PSNR
python main_train_psnr.py --opt options/train_msrresnet_psnr.json
- Training with
DataParallel
- GAN
python main_train_gan.py --opt options/train_msrresnet_gan.json
- Training with
DistributedDataParallel
- PSNR - 4 GPUs
python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 main_train_psnr.py --opt options/train_msrresnet_psnr.json --dist True
- Training with
DistributedDataParallel
- PSNR - 8 GPUs
python -m torch.distributed.launch --nproc_per_node=8 --master_port=1234 main_train_psnr.py --opt options/train_msrresnet_psnr.json --dist True
- Training with
DistributedDataParallel
- GAN - 4 GPUs
python -m torch.distributed.launch --nproc_per_node=4 --master_port=1234 main_train_gan.py --opt options/train_msrresnet_gan.json --dist True
- Training with
DistributedDataParallel
- GAN - 8 GPUs
python -m torch.distributed.launch --nproc_per_node=8 --master_port=1234 main_train_gan.py --opt options/train_msrresnet_gan.json --dist True
- Kill distributed training processes of
main_train_gan.py
kill $(ps aux | grep main_train_gan.py | grep -v grep | awk '{print $2}')
Network architectures
-
DnCNN
-
IRCNN denoiser
-
FFDNet
-
SRMD
-
SRResNet, SRGAN, RRDB, ESRGAN
-
IMDN
-----
Testing
Method | model_zoo |
---|---|
main_test_dncnn.py | dncnn_15.pth, dncnn_25.pth, dncnn_50.pth, dncnn_gray_blind.pth, dncnn_color_blind.pth, dncnn3.pth |
main_test_ircnn_denoiser.py | ircnn_gray.pth, ircnn_color.pth |
main_test_fdncnn.py | fdncnn_gray.pth, fdncnn_color.pth, fdncnn_gray_clip.pth, fdncnn_color_clip.pth |
main_test_ffdnet.py | ffdnet_gray.pth, ffdnet_color.pth, ffdnet_gray_clip.pth, ffdnet_color_clip.pth |
main_test_srmd.py | srmdnf_x2.pth, srmdnf_x3.pth, srmdnf_x4.pth, srmd_x2.pth, srmd_x3.pth, srmd_x4.pth |
The above models are converted from MatConvNet. | |
main_test_dpsr.py | dpsr_x2.pth, dpsr_x3.pth, dpsr_x4.pth, dpsr_x4_gan.pth |
main_test_msrresnet.py | msrresnet_x4_psnr.pth, msrresnet_x4_gan.pth |
main_test_rrdb.py | rrdb_x4_psnr.pth, rrdb_x4_esrgan.pth |
main_test_imdn.py | imdn_x4.pth |
model_zoo
trainsets
- https://github.com/xinntao/BasicSR/blob/master/docs/DatasetPreparation.md
- train400
- DIV2K
- Flickr2K
- optional: use split_imageset(original_dataroot, taget_dataroot, n_channels=3, p_size=512, p_overlap=96, p_max=800) to get
trainsets/trainH
with small images for fast data loading
testsets
- https://github.com/xinntao/BasicSR/blob/master/docs/DatasetPreparation.md
- set12
- bsd68
- cbsd68
- kodak24
- srbsd68
- set5
- set14
- cbsd100
- urban100
- manga109
References
@inproceedings{zhu2023denoising, % DiffPIR
title={Denoising Diffusion Models for Plug-and-Play Image Restoration},
author={Yuanzhi Zhu and Kai Zhang and Jingyun Liang and Jiezhang Cao and Bihan Wen and Radu Timofte and Luc Van Gool},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition Workshops},
year={2023}
}
@article{liang2022vrt,
title={VRT: A Video Restoration Transformer},
author={Liang, Jingyun and Cao, Jiezhang and Fan, Yuchen and Zhang, Kai and Ranjan, Rakesh and Li, Yawei and Timofte, Radu and Van Gool, Luc},
journal={arXiv preprint arXiv:2022.00000},
year={2022}
}
@inproceedings{liang2021swinir,
title={SwinIR: Image Restoration Using Swin Transformer},
author={Liang, Jingyun and Cao, Jiezhang and Sun, Guolei and Zhang, Kai and Van Gool, Luc and Timofte, Radu},
booktitle={IEEE International Conference on Computer Vision Workshops},
pages={1833--1844},
year={2021}
}
@inproceedings{zhang2021designing,
title={Designing a Practical Degradation Model for Deep Blind Image Super-Resolution},
author={Zhang, Kai and Liang, Jingyun and Van Gool, Luc and Timofte, Radu},
booktitle={IEEE International Conference on Computer Vision},
pages={4791--4800},
year={2021}
}
@article{zhang2021plug, % DPIR & DRUNet & IRCNN
title={Plug-and-Play Image Restoration with Deep Denoiser Prior},
author={Zhang, Kai and Li, Yawei and Zuo, Wangmeng and Zhang, Lei and Van Gool, Luc and Timofte, Radu},
journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},
year={2021}
}
@inproceedings{zhang2020aim, % efficientSR_challenge
title={AIM 2020 Challenge on Efficient Super-Resolution: Methods and Results},
author={Kai Zhang and Martin Danelljan and Yawei Li and Radu Timofte and others},
booktitle={European Conference on Computer Vision Workshops},
year={2020}
}
@inproceedings{zhang2020deep, % USRNet
title={Deep unfolding network for image super-resolution},
author={Zhang, Kai and Van Gool, Luc and Timofte, Radu},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
pages={3217--3226},
year={2020}
}
@article{zhang2017beyond, % DnCNN
title={Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising},
author={Zhang, Kai and Zuo, Wangmeng and Chen, Yunjin and Meng, Deyu and Zhang, Lei},
journal={IEEE Transactions on Image Processing},
volume={26},
number={7},
pages={3142--3155},
year={2017}
}
@inproceedings{zhang2017learning, % IRCNN
title={Learning deep CNN denoiser prior for image restoration},
author={Zhang, Kai and Zuo, Wangmeng and Gu, Shuhang and Zhang, Lei},
booktitle={IEEE conference on computer vision and pattern recognition},
pages={3929--3938},
year={2017}
}
@article{zhang2018ffdnet, % FFDNet, FDnCNN
title={FFDNet: Toward a fast and flexible solution for CNN-based image denoising},
author={Zhang, Kai and Zuo, Wangmeng and Zhang, Lei},
journal={IEEE Transactions on Image Processing},
volume={27},
number={9},
pages={4608--4622},
year={2018}
}
@inproceedings{zhang2018learning, % SRMD
title={Learning a single convolutional super-resolution network for multiple degradations},
author={Zhang, Kai and Zuo, Wangmeng and Zhang, Lei},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
pages={3262--3271},
year={2018}
}
@inproceedings{zhang2019deep, % DPSR
title={Deep Plug-and-Play Super-Resolution for Arbitrary Blur Kernels},
author={Zhang, Kai and Zuo, Wangmeng and Zhang, Lei},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
pages={1671--1681},
year={2019}
}
@InProceedings{wang2018esrgan, % ESRGAN, MSRResNet
author = {Wang, Xintao and Yu, Ke and Wu, Shixiang and Gu, Jinjin and Liu, Yihao and Dong, Chao and Qiao, Yu and Loy, Chen Change},
title = {ESRGAN: Enhanced super-resolution generative adversarial networks},
booktitle = {The European Conference on Computer Vision Workshops (ECCVW)},
month = {September},
year = {2018}
}
@inproceedings{hui2019lightweight, % IMDN
title={Lightweight Image Super-Resolution with Information Multi-distillation Network},
author={Hui, Zheng and Gao, Xinbo and Yang, Yunchu and Wang, Xiumei},
booktitle={Proceedings of the 27th ACM International Conference on Multimedia (ACM MM)},
pages={2024--2032},
year={2019}
}
@inproceedings{zhang2019aim, % IMDN
title={AIM 2019 Challenge on Constrained Super-Resolution: Methods and Results},
author={Kai Zhang and Shuhang Gu and Radu Timofte and others},
booktitle={IEEE International Conference on Computer Vision Workshops},
year={2019}
}
@inproceedings{yang2021gan,
title={GAN Prior Embedded Network for Blind Face Restoration in the Wild},
author={Tao Yang, Peiran Ren, Xuansong Xie, and Lei Zhang},
booktitle={IEEE Conference on Computer Vision and Pattern Recognition},
year={2021}
}
Top Related Projects
Real-ESRGAN aims at developing Practical Algorithms for General Image/Video Restoration.
SwinIR: Image Restoration Using Swin Transformer (official repository)
ECCV18 Workshops - Enhanced SRGAN. Champion PIRM Challenge on Perceptual Super-Resolution. The training codes are in BasicSR.
Tensorflow 2.x based implementation of EDSR, WDSR and SRGAN for single image super-resolution
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot