Convert Figma logo to code with AI

clovaai logoCRAFT-pytorch

Official implementation of Character Region Awareness for Text Detection (CRAFT)

3,057
866
3,057
113

Top Related Projects

23,625

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

42,444

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

3,546

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

19,492

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Quick Overview

CRAFT-pytorch is an implementation of the CRAFT (Character Region Awareness For Text detection) algorithm in PyTorch. It's designed for scene text detection, capable of accurately localizing text in various challenging scenarios, including curved or rotated text.

Pros

  • High accuracy in detecting text in complex scenes
  • Ability to handle multi-oriented and curved text
  • Pretrained models available for quick implementation
  • Supports both character-level and word-level text detection

Cons

  • Requires significant computational resources for training
  • May struggle with very small or highly stylized text
  • Limited documentation for customization and fine-tuning
  • Dependency on specific versions of libraries may cause compatibility issues

Code Examples

  1. Loading a pretrained CRAFT model:
from craft_text_detector import Craft

# Initialize CRAFT text detector
craft = Craft(output_dir=output_dir, crop_type="poly", cuda=False)
  1. Detecting text in an image:
# Perform text detection
prediction_result = craft.detect_text(image_path)

# Get the bounding boxes
boxes = prediction_result["boxes"]
  1. Visualizing the detected text regions:
from craft_text_detector import visualize_polygons

# Visualize the detection results
visualize_polygons(image_path, boxes, output_dir=output_dir)

Getting Started

To get started with CRAFT-pytorch:

  1. Clone the repository:

    git clone https://github.com/clovaai/CRAFT-pytorch.git
    cd CRAFT-pytorch
    
  2. Install dependencies:

    pip install -r requirements.txt
    
  3. Download the pretrained model:

    wget -O craft_mlt_25k.pth https://drive.google.com/uc?id=1Jk4eGD7crsqCCg9C9VjCLkMN3ze8kutZ
    
  4. Run the demo:

    python test.py --trained_model=craft_mlt_25k.pth --test_folder=./images/
    

This will process the images in the ./images/ folder and output the results in the ./result/ directory.

Competitor Comparisons

23,625

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Pros of EasyOCR

  • Supports multiple languages out of the box
  • Easier to use with a simpler API
  • Includes both text detection and recognition in one package

Cons of EasyOCR

  • Generally slower performance compared to CRAFT
  • Less flexibility for fine-tuning and customization
  • May have lower accuracy on complex or non-standard text layouts

Code Comparison

EasyOCR:

import easyocr
reader = easyocr.Reader(['en'])
result = reader.readtext('image.jpg')

CRAFT:

from craft_text_detector import Craft
craft = Craft(output_dir=output_dir, crop_type="poly", cuda=False)
prediction_result = craft.detect_text(image_path)

EasyOCR provides a more straightforward API for quick text detection and recognition, while CRAFT offers more control over the text detection process. CRAFT focuses solely on text detection and requires additional steps for recognition, whereas EasyOCR combines both functionalities. CRAFT's approach allows for more precise tuning of the detection process, potentially yielding better results in challenging scenarios, but at the cost of increased complexity in implementation and usage.

42,444

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

Pros of PaddleOCR

  • Comprehensive end-to-end OCR solution, including detection, recognition, and layout analysis
  • Supports multiple languages and offers pre-trained models for various scenarios
  • Active development with frequent updates and improvements

Cons of PaddleOCR

  • Steeper learning curve due to its extensive features and components
  • Larger codebase and potentially higher resource requirements

Code Comparison

CRAFT-pytorch (text detection):

from craft_text_detector import Craft

craft = Craft(output_dir="outputs", crop_type="poly", cuda=False)
prediction_result = craft.detect_text("path/to/image.jpg")

PaddleOCR (text detection and recognition):

from paddleocr import PaddleOCR

ocr = PaddleOCR(use_angle_cls=True, lang='en')
result = ocr.ocr('path/to/image.jpg', cls=True)

PaddleOCR offers a more comprehensive solution with both detection and recognition in a single function call, while CRAFT-pytorch focuses specifically on text detection. PaddleOCR's code is more concise for end-to-end OCR tasks, but CRAFT-pytorch provides more fine-grained control over the text detection process.

3,546

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Pros of doctr

  • Broader scope: Covers full OCR pipeline including detection, recognition, and end-to-end models
  • More comprehensive documentation and examples
  • Actively maintained with regular updates

Cons of doctr

  • Potentially more complex to use due to its broader scope
  • May have higher computational requirements for full pipeline

Code Comparison

CRAFT-pytorch (text detection):

from craft_text_detector import Craft

craft = Craft(output_dir="outputs", crop_type="poly", cuda=False)
prediction_result = craft.detect_text("path/to/image.jpg")

doctr (full OCR pipeline):

from doctr.io import DocumentFile
from doctr.models import ocr_predictor

model = ocr_predictor(pretrained=True)
doc = DocumentFile.from_images("path/to/image.jpg")
result = model(doc)

Both repositories focus on text detection and recognition, but doctr provides a more comprehensive OCR solution. CRAFT-pytorch specializes in text detection using the CRAFT algorithm, while doctr offers a full pipeline including detection, recognition, and end-to-end models. doctr's broader scope may make it more suitable for complete OCR tasks, while CRAFT-pytorch might be preferred for specialized text detection needs.

19,492

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Pros of UniLM

  • Broader scope: Supports multiple NLP tasks beyond text detection
  • Pre-trained models available for various languages and domains
  • Flexible architecture allowing for fine-tuning on specific tasks

Cons of UniLM

  • More complex to set up and use compared to CRAFT
  • Requires more computational resources for training and inference
  • May have slower inference speed for text detection tasks

Code Comparison

CRAFT (PyTorch):

from craft_text_detector import Craft

craft = Craft(output_dir=output_dir, crop_type="poly", cuda=cuda)
prediction_result = craft.detect_text(image_path)

UniLM:

from transformers import UniLMTokenizer, UniLMForConditionalGeneration

tokenizer = UniLMTokenizer.from_pretrained("microsoft/unilm-base-cased")
model = UniLMForConditionalGeneration.from_pretrained("microsoft/unilm-base-cased")
inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model.generate(**inputs)

Summary

CRAFT is a specialized tool for text detection in images, while UniLM is a more versatile NLP model supporting various tasks. CRAFT may be easier to use for specific text detection applications, while UniLM offers greater flexibility and potential for broader NLP tasks. The choice between them depends on the specific requirements of the project and the available resources.

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Pros of Detectron2

  • Broader scope: Supports multiple computer vision tasks beyond text detection
  • More extensive documentation and community support
  • Modular architecture allowing easier customization and extension

Cons of Detectron2

  • Steeper learning curve due to its complexity and broader feature set
  • Potentially higher computational requirements for simple text detection tasks

Code Comparison

CRAFT-pytorch:

from craft_text_detector import Craft

craft = Craft(output_dir=output_dir, crop_type="poly", cuda=cuda)
prediction_result = craft.detect_text(image_path)

Detectron2:

from detectron2.config import get_cfg
from detectron2.engine import DefaultPredictor

cfg = get_cfg()
cfg.merge_from_file(model_zoo.get_config_file("COCO-Detection/faster_rcnn_R_50_FPN_3x.yaml"))
predictor = DefaultPredictor(cfg)
outputs = predictor(image)

Summary

Detectron2 offers a more comprehensive computer vision toolkit with broader applications, while CRAFT-pytorch focuses specifically on text detection. Detectron2 provides more flexibility and community support but may be overkill for simple text detection tasks. CRAFT-pytorch offers a more straightforward implementation for text detection but with less extensibility.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

CRAFT: Character-Region Awareness For Text detection

Official Pytorch implementation of CRAFT text detector | Paper | Pretrained Model | Supplementary

Youngmin Baek, Bado Lee, Dongyoon Han, Sangdoo Yun, Hwalsuk Lee.

Clova AI Research, NAVER Corp.

Sample Results

Overview

PyTorch implementation for CRAFT text detector that effectively detect text area by exploring each character region and affinity between characters. The bounding box of texts are obtained by simply finding minimum bounding rectangles on binary map after thresholding character region and affinity scores.

teaser

Updates

13 Jun, 2019: Initial update 20 Jul, 2019: Added post-processing for polygon result 28 Sep, 2019: Added the trained model on IC15 and the link refiner

Getting started

Install dependencies

Requirements

  • PyTorch>=0.4.1
  • torchvision>=0.2.1
  • opencv-python>=3.4.2
  • check requiremtns.txt
pip install -r requirements.txt

Training

The code for training is not included in this repository, and we cannot release the full training code for IP reason.

Test instruction using pretrained model

  • Download the trained models
Model nameUsed datasetsLanguagesPurposeModel Link
GeneralSynthText, IC13, IC17Eng + MLTFor general purposeClick
IC15SynthText, IC15EngFor IC15 onlyClick
LinkRefinerCTW1500-Used with the General ModelClick
  • Run with pretrained model
python test.py --trained_model=[weightfile] --test_folder=[folder path to test images]

The result image and socre maps will be saved to ./result by default.

Arguments

  • --trained_model: pretrained model
  • --text_threshold: text confidence threshold
  • --low_text: text low-bound score
  • --link_threshold: link confidence threshold
  • --cuda: use cuda for inference (default:True)
  • --canvas_size: max image size for inference
  • --mag_ratio: image magnification ratio
  • --poly: enable polygon type result
  • --show_time: show processing time
  • --test_folder: folder path to input images
  • --refine: use link refiner for sentense-level dataset
  • --refiner_model: pretrained refiner model

Links

Citation

@inproceedings{baek2019character,
  title={Character Region Awareness for Text Detection},
  author={Baek, Youngmin and Lee, Bado and Han, Dongyoon and Yun, Sangdoo and Lee, Hwalsuk},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
  pages={9365--9374},
  year={2019}
}

License

Copyright (c) 2019-present NAVER Corp.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.