ESRGAN

ECCV18 Workshops - Enhanced SRGAN. Champion PIRM Challenge on Perceptual Super-Resolution. The training codes are in BasicSR.

6,397

1,115

6,397

View on GitHub

Top Related Projects

upscayl

38,372

🆙 Upscayl - #1 Free and Open Source AI Image Upscaler for Linux, MacOS and Windows.

waifu2x-ncnn-vulkan

3,169

waifu2x converter ncnn version, runs fast on intel / amd / nvidia / apple-silicon GPU with vulkan

Video, Image and GIF upscale/enlarge(Super-Resolution) and Video frame interpolation. Achieved with Waifu2x, Real-ESRGAN, Real-CUGAN, RTX Video Super Resolution VSR, SRMD, RealSR, Anime4K, RIFE, IFRNet, CAIN, DAIN, and ACNet.

GFPGAN

36,861

GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.

Quick Overview

ESRGAN (Enhanced Super-Resolution Generative Adversarial Networks) is an open-source project for image super-resolution. It improves upon the original SRGAN model, offering better visual quality and higher upscaling factors for single image super-resolution tasks.

Pros

Produces high-quality, photorealistic results with impressive detail enhancement
Supports high upscaling factors (up to 4x)
Includes pre-trained models for easy implementation
Actively maintained with regular updates and improvements

Cons

Computationally intensive, requiring significant GPU resources for training and inference
May introduce artifacts in some cases, especially with extreme upscaling
Limited flexibility in terms of customizing the network architecture
Requires careful hyperparameter tuning for optimal results

Code Examples

Loading a pre-trained ESRGAN model:

from models.rrdb_net import RRDBNet
model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32)
model.load_state_dict(torch.load('models/RRDB_ESRGAN_x4.pth'), strict=True)

Upscaling an image using ESRGAN:

from utils import util_calculate_psnr_ssim as util

img_lq = util.img2tensor(util.imread(lq_path))
img_lq = img_lq.unsqueeze(0).to(device)

output = model(img_lq)
output = util.tensor2img(output.squeeze(0))

Saving the upscaled image:

import cv2
cv2.imwrite('output.png', output)

Getting Started

Clone the repository:

git clone https://github.com/xinntao/ESRGAN.git
cd ESRGAN

Install dependencies:
```
pip install -r requirements.txt
```
Download pre-trained models from the releases page and place them in the models/ directory.

Run inference on an image:

python test.py --model_path models/RRDB_ESRGAN_x4.pth --lr_path input.png --output output.png

Competitor Comparisons

upscayl

38,372

🆙 Upscayl - #1 Free and Open Source AI Image Upscaler for Linux, MacOS and Windows.

Pros of Upscayl

User-friendly GUI application, making it accessible to non-technical users
Cross-platform support (Windows, macOS, Linux)
Integrates multiple AI upscaling models, offering versatility

Cons of Upscayl

Less flexibility for advanced users compared to ESRGAN's command-line interface
May have higher system requirements due to the GUI overhead

Code Comparison

ESRGAN (Python):

model = arch.RRDB_Net(3, 3, 64, 23, gc=32)
model.load_state_dict(torch.load(model_path), strict=True)
model.eval()
img = utils.imread_uint(image_path, n_channels=3)
img = torch.from_numpy(np.transpose(img[:, :, [2, 1, 0]], (2, 0, 1))).float()
img_LR = img.unsqueeze(0)
output = model(img_LR).data.squeeze().float().cpu().clamp_(0, 1).numpy()

Upscayl (JavaScript/Electron):

const { ipcRenderer } = require('electron');
ipcRenderer.on('upscale-progress', (event, progress) => {
  updateProgressBar(progress);
});
ipcRenderer.send('start-upscale', imagePath, modelPath);

Note: The code snippets are simplified examples and may not represent the full functionality of each project.

waifu2x-ncnn-vulkan

3,169

waifu2x converter ncnn version, runs fast on intel / amd / nvidia / apple-silicon GPU with vulkan

Pros of waifu2x-ncnn-vulkan

Faster processing speed due to GPU acceleration with Vulkan
Smaller memory footprint and more efficient resource usage
Cross-platform support for Windows, Linux, and macOS

Cons of waifu2x-ncnn-vulkan

Limited to specific upscaling models (waifu2x)
Less flexibility in terms of customization and fine-tuning
Primarily focused on anime-style images

Code Comparison

ESRGAN (Python):

model = arch.RRDBNet(3, 3, 64, 23, gc=32)
model.load_state_dict(torch.load(model_path), strict=True)
model.eval()
output = model(input_tensor)

waifu2x-ncnn-vulkan (C++):

ncnn::Net waifu2x;
waifu2x.opt.use_vulkan_compute = true;
waifu2x.load_param("models-cunet/noise0_scale2.0x_model.param");
waifu2x.load_model("models-cunet/noise0_scale2.0x_model.bin");
ncnn::Mat out_mat;
waifu2x.extract("scale2", in_mat, out_mat);

ESRGAN offers more flexibility and customization options, making it suitable for various image types and allowing users to train custom models. waifu2x-ncnn-vulkan, on the other hand, provides faster processing and better resource efficiency, making it ideal for quick upscaling tasks, especially for anime-style images.

Waifu2x-Extension-GUI

15,071

Pros of Waifu2x-Extension-GUI

User-friendly graphical interface for easy operation
Supports multiple AI models, including ESRGAN, Waifu2x, and Real-ESRGAN
Batch processing capabilities for multiple images or videos

Cons of Waifu2x-Extension-GUI

May have higher system requirements due to the GUI and multiple models
Potentially slower processing times compared to command-line alternatives
Less flexibility for advanced users who prefer direct code manipulation

Code Comparison

ESRGAN (Python):

model = arch.RRDBNet(3, 3, 64, 23, gc=32)
model.load_state_dict(torch.load(model_path), strict=True)
model.eval()
img = utils.imread_uint(img_path, n_channels=3)
img = torch.from_numpy(np.transpose(img, (2, 0, 1))).float()
img_LR = img.unsqueeze(0)
with torch.no_grad():
    output = model(img_LR).data.squeeze().float().cpu().clamp_(0, 1).numpy()

Waifu2x-Extension-GUI (C++):

int main(int argc, char *argv[])
{
    QApplication a(argc, argv);
    MainWindow w;
    w.show();
    return a.exec();
}

Note: The code comparison is limited due to the different nature of the projects. ESRGAN focuses on the AI model implementation, while Waifu2x-Extension-GUI primarily deals with the graphical interface and integration of multiple models.

GFPGAN

36,861

GFPGAN aims at developing Practical Algorithms for Real-world Face Restoration.

Pros of GFPGAN

Specifically designed for face restoration, offering better results for facial features
Incorporates GAN-based upscaling and face restoration in a single model
Provides pre-trained models for easy implementation

Cons of GFPGAN

Limited to face restoration, less versatile for general image enhancement
May require more computational resources due to its comprehensive approach

Code Comparison

ESRGAN:

model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32)
upsampler = RealESRGANer(scale=4, model_path='weights/RealESRGAN_x4plus.pth', model=model)

GFPGAN:

restorer = GFPGANer(model_path='weights/GFPGANv1.3.pth', upscale=2)
restored_img, _ = restorer.enhance(img, has_aligned=False, only_center_face=False, paste_back=True)

Both repositories focus on image enhancement, but GFPGAN specializes in face restoration while ESRGAN is more general-purpose. GFPGAN combines upscaling and face restoration in one step, potentially offering better results for facial images. However, ESRGAN remains more versatile for various image types. The code snippets demonstrate the simplicity of implementation for both models, with GFPGAN requiring slightly less setup for face-specific tasks.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

ESRGAN (Enhanced SRGAN) [:rocket: BasicSR] [Real-ESRGAN]

:sparkles: New Updates.

We have extended ESRGAN to Real-ESRGAN, which is a more practical algorithm for real-world image restoration. For example, it can also remove annoying JPEG compression artifacts.
You are recommended to have a try :smiley:

In the Real-ESRGAN repo,

You can still use the original ESRGAN model or your re-trained ESRGAN model. The model zoo in Real-ESRGAN.
We provide a more handy inference script, which supports 1) tile inference; 2) images with alpha channel; 3) gray images; 4) 16-bit images.
We also provide a Windows executable file RealESRGAN-ncnn-vulkan for easier use without installing the environment. This executable file also includes the original ESRGAN model.
The full training codes are also released in the Real-ESRGAN repo.

Welcome to open issues or open discussions in the Real-ESRGAN repo.

If you have any question, you can open an issue in the Real-ESRGAN repo.
If you have any good ideas or demands, please open an issue/discussion in the Real-ESRGAN repo to let me know.
If you have some images that Real-ESRGAN could not well restored, please also open an issue/discussion in the Real-ESRGAN repo. I will record it (but I cannot guarantee to resolve itð).

Here are some examples for Real-ESRGAN:

:book: Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

[Paper]
Xintao Wang, Liangbin Xie, Chao Dong, Ying Shan
Applied Research Center (ARC), Tencent PCG
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences

As there may be some repos have dependency on this ESRGAN repo, we will not modify this ESRGAN repo (especially the codes).

The following is the original README:

The training codes are in :rocket: BasicSR. This repo only provides simple testing codes, pretrained models and the network interpolation demo.

BasicSR is an open source image and video super-resolution toolbox based on PyTorch (will extend to more restoration tasks in the future).
It includes methods such as EDSR, RCAN, SRResNet, SRGAN, ESRGAN, EDVR, etc. It now also supports StyleGAN2.

Enhanced Super-Resolution Generative Adversarial Networks

By Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Yu Qiao, Chen Change Loy

We won the first place in PIRM2018-SR competition (region 3) and got the best perceptual index. The paper is accepted to ECCV2018 PIRM Workshop.

:triangular_flag_on_post: Add Frequently Asked Questions.

For instance,

How to reproduce your results in the PIRM18-SR Challenge (with low perceptual index)?

How do you get the perceptual index in your ESRGAN paper?

BibTeX

@InProceedings{wang2018esrgan,
    author = {Wang, Xintao and Yu, Ke and Wu, Shixiang and Gu, Jinjin and Liu, Yihao and Dong, Chao and Qiao, Yu and Loy, Chen Change},
    title = {ESRGAN: Enhanced super-resolution generative adversarial networks},
    booktitle = {The European Conference on Computer Vision Workshops (ECCVW)},
    month = {September},
    year = {2018}
}

The RRDB_PSNR PSNR_oriented model trained with DF2K dataset (a merged dataset with DIV2K and Flickr2K (proposed in EDSR)) is also able to achive high PSNR performance.

_Method	_{Training dataset}	_Set5	_Set14	_BSD100	_Urban100	_Manga109
_SRCNN	₂₉₁	_30.48/0.8628	_27.50/0.7513	_26.90/0.7101	_24.52/0.7221	_27.58/0.8555
_EDSR	_DIV2K	_32.46/0.8968	_28.80/0.7876	_27.71/0.7420	_26.64/0.8033	_31.02/0.9148
_RCAN	_DIV2K	_32.63/0.9002	_28.87/0.7889	_27.77/0.7436	_{26.82/ 0.8087}	_{31.22/ 0.9173}
_RRDB(ours)	_DF2K	_32.73/0.9011	_28.99/0.7917	_27.85/0.7455	_27.03/0.8153	_31.66/0.9196

Quick Test

Dependencies

Python 3
PyTorch >= 1.0 (CUDA version >= 7.5 if installing with CUDA. More details)
Python packages: pip install numpy opencv-python

Test models

Clone this github repo.

git clone https://github.com/xinntao/ESRGAN
cd ESRGAN

Place your own low-resolution images in ./LR folder. (There are two sample images - baboon and comic).
Download pretrained models from Google Drive or Baidu Drive. Place the models in ./models. We provide two models with high perceptual quality and high PSNR performance (see model list).
Run test. We provide ESRGAN model and RRDB_PSNR model and you can config in the test.py.

python test.py

The results are in ./results folder.

Network interpolation demo

You can interpolate the RRDB_ESRGAN and RRDB_PSNR models with alpha in [0, 1].

Run python net_interp.py 0.8, where 0.8 is the interpolation parameter and you can change it to any value in [0,1].
Run python test.py models/interp_08.pth, where models/interp_08.pth is the model path.

Perceptual-driven SR Results

You can download all the resutls from Google Drive. (:heavy_check_mark: included; :heavy_minus_sign: not included; :o: TODO)

HR images can be downloaed from BasicSR-Datasets.

Datasets	LR	ESRGAN	SRGAN	EnhanceNet	CX
Set5	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:o:
Set14	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:o:
BSDS100	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:heavy_check_mark:	:o:
PIRM ^{(val, test)}	:heavy_check_mark:	:heavy_check_mark:	:heavy_minus_sign:	:heavy_check_mark:	:heavy_check_mark:
OST300	:heavy_check_mark:	:heavy_check_mark:	:heavy_minus_sign:	:heavy_check_mark:	:o:
urban100	:heavy_check_mark:	:heavy_check_mark:	:heavy_minus_sign:	:heavy_check_mark:	:o:
DIV2K ^{(val, test)}	:heavy_check_mark:	:heavy_check_mark:	:heavy_minus_sign:	:heavy_check_mark:	:o:

ESRGAN

We improve the SRGAN from three aspects:

adopt a deeper model using Residual-in-Residual Dense Block (RRDB) without batch normalization layers.
employ Relativistic average GAN instead of the vanilla GAN.
improve the perceptual loss by using the features before activation.

In contrast to SRGAN, which claimed that deeper models are increasingly difficult to train, our deeper ESRGAN model shows its superior performance with easy training.

Network Interpolation

We propose the network interpolation strategy to balance the visual quality and PSNR.

We show the smooth animation with the interpolation parameters changing from 0 to 1. Interestingly, it is observed that the network interpolation strategy provides a smooth control of the RRDB_PSNR model and the fine-tuned ESRGAN model.

Qualitative Results

PSNR (evaluated on the Y channel) and the perceptual index used in the PIRM-SR challenge are also provided for reference.

Ablation Study

Overall visual comparisons for showing the effects of each component in ESRGAN. Each column represents a model with its configurations in the top. The red sign indicates the main improvement compared with the previous model.

BN artifacts

We empirically observe that BN layers tend to bring artifacts. These artifacts, namely BN artifacts, occasionally appear among iterations and different settings, violating the needs for a stable performance over training. We find that the network depth, BN position, training dataset and training loss have impact on the occurrence of BN artifacts.

Useful techniques to train a very deep network

We find that residual scaling and smaller initialization can help to train a very deep network. More details are in the Supplementary File attached in our paper.

The influence of training patch size

We observe that training a deeper network benefits from a larger patch size. Moreover, the deeper model achieves more improvement (â¼0.12dB) than the shallower one (â¼0.04dB) since larger model capacity is capable of taking full advantage of larger training patch size. (Evaluated on Set5 dataset with RGB channels.)

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot