CRAFT-pytorch

Official implementation of Character Region Awareness for Text Detection (CRAFT)

3,275

926

3,275

114

View on GitHub

Top Related Projects

EasyOCR

27,439

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Awesome multilingual OCR and Document Parsing toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

doctr

4,894

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

unilm

21,586

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

detectron2

32,239

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Quick Overview

CRAFT-pytorch is an implementation of the CRAFT (Character Region Awareness For Text detection) algorithm in PyTorch. It's designed for scene text detection, capable of accurately localizing text in various challenging scenarios, including curved or rotated text.

Pros

High accuracy in detecting text in complex scenes
Ability to handle multi-oriented and curved text
Pretrained models available for quick implementation
Supports both character-level and word-level text detection

Cons

Requires significant computational resources for training
May struggle with very small or highly stylized text
Limited documentation for customization and fine-tuning
Dependency on specific versions of libraries may cause compatibility issues

Code Examples

Loading a pretrained CRAFT model:

from craft_text_detector import Craft

# Initialize CRAFT text detector
craft = Craft(output_dir=output_dir, crop_type="poly", cuda=False)

Detecting text in an image:

# Perform text detection
prediction_result = craft.detect_text(image_path)

# Get the bounding boxes
boxes = prediction_result["boxes"]

Visualizing the detected text regions:

from craft_text_detector import visualize_polygons

# Visualize the detection results
visualize_polygons(image_path, boxes, output_dir=output_dir)

Getting Started

To get started with CRAFT-pytorch:

Clone the repository:

git clone https://github.com/clovaai/CRAFT-pytorch.git
cd CRAFT-pytorch

Install dependencies:
```
pip install -r requirements.txt
```

Download the pretrained model:

wget -O craft_mlt_25k.pth https://drive.google.com/uc?id=1Jk4eGD7crsqCCg9C9VjCLkMN3ze8kutZ

Run the demo:

python test.py --trained_model=craft_mlt_25k.pth --test_folder=./images/

This will process the images in the ./images/ folder and output the results in the ./result/ directory.

Competitor Comparisons

EasyOCR

27,439

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Pros of EasyOCR

Supports multiple languages out of the box
Easier to use with a simpler API
Includes both text detection and recognition in one package

Cons of EasyOCR

Generally slower performance compared to CRAFT
Less flexibility for fine-tuning and customization
May have lower accuracy on complex or non-standard text layouts

Code Comparison

EasyOCR:

import easyocr
reader = easyocr.Reader(['en'])
result = reader.readtext('image.jpg')

CRAFT:

from craft_text_detector import Craft
craft = Craft(output_dir=output_dir, crop_type="poly", cuda=False)
prediction_result = craft.detect_text(image_path)

EasyOCR provides a more straightforward API for quick text detection and recognition, while CRAFT offers more control over the text detection process. CRAFT focuses solely on text detection and requires additional steps for recognition, whereas EasyOCR combines both functionalities. CRAFT's approach allows for more precise tuning of the detection process, potentially yielding better results in challenging scenarios, but at the cost of increased complexity in implementation and usage.

PaddleOCR

52,233

Pros of PaddleOCR

Comprehensive end-to-end OCR solution, including detection, recognition, and layout analysis
Supports multiple languages and offers pre-trained models for various scenarios
Active development with frequent updates and improvements

Cons of PaddleOCR

Steeper learning curve due to its extensive features and components
Larger codebase and potentially higher resource requirements

Code Comparison

CRAFT-pytorch (text detection):

from craft_text_detector import Craft

craft = Craft(output_dir="outputs", crop_type="poly", cuda=False)
prediction_result = craft.detect_text("path/to/image.jpg")

PaddleOCR (text detection and recognition):

from paddleocr import PaddleOCR

ocr = PaddleOCR(use_angle_cls=True, lang='en')
result = ocr.ocr('path/to/image.jpg', cls=True)

PaddleOCR offers a more comprehensive solution with both detection and recognition in a single function call, while CRAFT-pytorch focuses specifically on text detection. PaddleOCR's code is more concise for end-to-end OCR tasks, but CRAFT-pytorch provides more fine-grained control over the text detection process.

doctr

4,894

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Pros of doctr

Broader scope: Covers full OCR pipeline including detection, recognition, and end-to-end models
More comprehensive documentation and examples
Actively maintained with regular updates

Cons of doctr

Potentially more complex to use due to its broader scope
May have higher computational requirements for full pipeline

Code Comparison

CRAFT-pytorch (text detection):

from craft_text_detector import Craft

craft = Craft(output_dir="outputs", crop_type="poly", cuda=False)
prediction_result = craft.detect_text("path/to/image.jpg")

doctr (full OCR pipeline):

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True)
doc = DocumentFile.from_images("path/to/image.jpg")
result = model(doc)

Both repositories focus on text detection and recognition, but doctr provides a more comprehensive OCR solution. CRAFT-pytorch specializes in text detection using the CRAFT algorithm, while doctr offers a full pipeline including detection, recognition, and end-to-end models. doctr's broader scope may make it more suitable for complete OCR tasks, while CRAFT-pytorch might be preferred for specialized text detection needs.

unilm

21,586

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Pros of UniLM

Broader scope: Supports multiple NLP tasks beyond text detection
Pre-trained models available for various languages and domains
Flexible architecture allowing for fine-tuning on specific tasks

Cons of UniLM

More complex to set up and use compared to CRAFT
Requires more computational resources for training and inference
May have slower inference speed for text detection tasks

Code Comparison

CRAFT (PyTorch):

from craft_text_detector import Craft

craft = Craft(output_dir=output_dir, crop_type="poly", cuda=cuda)
prediction_result = craft.detect_text(image_path)

UniLM:

from transformers import UniLMTokenizer, UniLMForConditionalGeneration

tokenizer = UniLMTokenizer.from_pretrained("microsoft/unilm-base-cased")
model = UniLMForConditionalGeneration.from_pretrained("microsoft/unilm-base-cased")
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model.generate(**inputs)

Summary

CRAFT is a specialized tool for text detection in images, while UniLM is a more versatile NLP model supporting various tasks. CRAFT may be easier to use for specific text detection applications, while UniLM offers greater flexibility and potential for broader NLP tasks. The choice between them depends on the specific requirements of the project and the available resources.

detectron2

32,239

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

Broader scope: Supports multiple computer vision tasks beyond text detection
More extensive documentation and community support
Modular architecture allowing easier customization and extension

Cons of Detectron2

Steeper learning curve due to its complexity and broader feature set
Potentially higher computational requirements for simple text detection tasks

Code Comparison

CRAFT-pytorch:

from craft_text_detector import Craft

craft = Craft(output_dir=output_dir, crop_type="poly", cuda=cuda)
prediction_result = craft.detect_text(image_path)

Detectron2:

from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"))
predictor = DefaultPredictor(cfg)
outputs = predictor(image)

Summary

Detectron2 offers a more comprehensive computer vision toolkit with broader applications, while CRAFT-pytorch focuses specifically on text detection. Detectron2 provides more flexibility and community support but may be overkill for simple text detection tasks. CRAFT-pytorch offers a more straightforward implementation for text detection but with less extensibility.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

CRAFT: Character-Region Awareness For Text detection

Official Pytorch implementation of CRAFT text detector | Paper | Pretrained Model | Supplementary

Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee.

Clova AI Research, NAVER Corp.

Sample Results

Overview

PyTorch implementation for CRAFT text detector that effectively detect text area by exploring each character region and affinity between characters. The bounding box of texts are obtained by simply finding minimum bounding rectangles on binary map after thresholding character region and affinity scores.

Updates

13 Jun, 2019: Initial update 20 Jul, 2019: Added post-processing for polygon result 28 Sep, 2019: Added the trained model on IC15 and the link refiner

Getting started

Install dependencies

Requirements

PyTorch>=0.4.1
torchvision>=0.2.1
opencv-python>=3.4.2
check requiremtns.txt

pip install -r requirements.txt

Training

The code for training is not included in this repository, and we cannot release the full training code for IP reason.

Test instruction using pretrained model

Download the trained models

Model name	Used datasets	Languages	Purpose	Model Link
General	SynthText, IC13, IC17	Eng + MLT	For general purpose	Click
IC15	SynthText, IC15	Eng	For IC15 only	Click
LinkRefiner	CTW1500	-	Used with the General Model	Click

Run with pretrained model

python test.py --trained_model=[weightfile] --test_folder=[folder path to test images]

The result image and socre maps will be saved to ./result by default.

Arguments

--trained_model: pretrained model
--text_threshold: text confidence threshold
--low_text: text low-bound score
--link_threshold: link confidence threshold
--cuda: use cuda for inference (default:True)
--canvas_size: max image size for inference
--mag_ratio: image magnification ratio
--poly: enable polygon type result
--show_time: show processing time
--test_folder: folder path to input images
--refine: use link refiner for sentense-level dataset
--refiner_model: pretrained refiner model

Citation

@inproceedings{baek2019character,
  title={Character Region Awareness for Text Detection},
  author={Baek, Youngmin and Lee, Bado and Han, Dongyoon and Yun, Sangdoo and Lee, Hwalsuk},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={9365--9374},
  year={2019}
}

License

Copyright (c) 2019-present NAVER Corp.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

Top Related Projects

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

Top Related Projects

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

Pros of EasyOCR

Cons of EasyOCR

Code Comparison

Pros of PaddleOCR

Cons of PaddleOCR

Code Comparison

Pros of doctr

Cons of doctr

Code Comparison

Pros of UniLM

Cons of UniLM

Code Comparison

Summary

Pros of Detectron2

Cons of Detectron2

Code Comparison

Summary

Convert designs to code with AI

README

CRAFT: Character-Region Awareness For Text detection

Sample Results

Overview

Updates

Getting started

Install dependencies

Requirements

Training

Test instruction using pretrained model

Arguments

Links

Citation

License

Top Related Projects

Convert designs to code with AI