Convert Figma logo to code with AI

PaddlePaddle logoPaddleOCR

Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)

42,444
7,656
42,444
181

Top Related Projects

19,492

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

60,774

Tesseract Open Source OCR Engine (main repository)

23,625

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Text recognition (optical character recognition) with deep learning methods, ICCV 2019

24,519

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Quick Overview

PaddleOCR is an open-source Optical Character Recognition (OCR) toolkit developed by Baidu's PaddlePaddle team. It provides a comprehensive set of tools for text detection, recognition, and layout analysis, supporting multiple languages and offering both lightweight and accurate models for various OCR tasks.

Pros

  • Comprehensive OCR solution with support for multiple languages and tasks
  • Offers both lightweight models for mobile devices and high-accuracy models for server-side applications
  • Active development and frequent updates from the PaddlePaddle team
  • Extensive documentation and examples for easy integration

Cons

  • Primarily based on the PaddlePaddle deep learning framework, which may have a steeper learning curve for those familiar with other frameworks
  • Some advanced features may require more computational resources
  • Documentation is sometimes not fully up-to-date with the latest features
  • Limited community support compared to some other popular OCR libraries

Code Examples

  1. Basic text detection and recognition:
from paddleocr import PaddleOCR

ocr = PaddleOCR(use_angle_cls=True, lang='en')
result = ocr.ocr('image.jpg')
for line in result:
    print(line)
  1. Extracting text from a specific region of an image:
import cv2
from paddleocr import PaddleOCR

image = cv2.imread('image.jpg')
roi = image[100:300, 200:400]  # Define region of interest
ocr = PaddleOCR(use_angle_cls=True, lang='en')
result = ocr.ocr(roi)
for line in result:
    print(line[1][0])  # Print recognized text
  1. Using a custom dictionary for text recognition:
from paddleocr import PaddleOCR

custom_dict = 'path/to/custom_dict.txt'
ocr = PaddleOCR(use_angle_cls=True, lang='en', rec_char_dict_path=custom_dict)
result = ocr.ocr('image.jpg')
for line in result:
    print(line)

Getting Started

To get started with PaddleOCR:

  1. Install PaddleOCR:
pip install paddleocr
  1. Use PaddleOCR in your Python script:
from paddleocr import PaddleOCR

# Initialize PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='en')

# Perform OCR on an image
result = ocr.ocr('path/to/your/image.jpg')

# Print the results
for line in result:
    print(line)

For more advanced usage and customization options, refer to the official documentation on the PaddleOCR GitHub repository.

Competitor Comparisons

19,492

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Pros of UniLM

  • Broader scope: Supports a wide range of natural language processing tasks beyond OCR
  • More advanced language understanding: Utilizes large-scale pre-trained models for improved performance
  • Active research focus: Regularly updated with cutting-edge NLP techniques and models

Cons of UniLM

  • Less specialized for OCR: May not offer as many OCR-specific features and optimizations
  • Potentially more complex to use: Broader scope may require more setup and configuration for OCR tasks
  • Larger resource requirements: Pre-trained models can be computationally intensive

Code Comparison

PaddleOCR:

from paddleocr import PaddleOCR

ocr = PaddleOCR(use_angle_cls=True, lang='en')
result = ocr.ocr('image.jpg')

UniLM (using LayoutLM for OCR):

from transformers import LayoutLMForTokenClassification, LayoutLMTokenizer

model = LayoutLMForTokenClassification.from_pretrained("microsoft/layoutlm-base-uncased")
tokenizer = LayoutLMTokenizer.from_pretrained("microsoft/layoutlm-base-uncased")

Note: The code snippets demonstrate basic setup and may not reflect the full complexity of using each library for OCR tasks.

60,774

Tesseract Open Source OCR Engine (main repository)

Pros of Tesseract

  • Mature and widely adopted OCR engine with a long history
  • Supports a wide range of languages and scripts
  • Highly customizable with extensive documentation

Cons of Tesseract

  • Generally slower performance compared to modern deep learning-based approaches
  • May struggle with complex layouts or low-quality images
  • Requires more manual configuration for optimal results

Code Comparison

Tesseract

import pytesseract
from PIL import Image

image = Image.open('image.png')
text = pytesseract.image_to_string(image)
print(text)

PaddleOCR

from paddleocr import PaddleOCR

ocr = PaddleOCR(use_angle_cls=True, lang='en')
result = ocr.ocr('image.png', cls=True)
for line in result:
    print(line[1][0])

PaddleOCR offers a more streamlined API for OCR tasks, with built-in support for text detection, recognition, and angle classification. Tesseract, while powerful, often requires additional preprocessing steps for optimal results. PaddleOCR's deep learning approach generally provides better performance on complex layouts and low-quality images, but Tesseract's extensive language support and customization options make it a versatile choice for specific use cases.

23,625

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Pros of EasyOCR

  • Simpler installation process and easier to use for beginners
  • Supports a wider range of languages (80+) out of the box
  • Better documentation and examples for quick start

Cons of EasyOCR

  • Generally slower inference speed compared to PaddleOCR
  • Less flexibility and customization options for advanced users
  • Smaller community and fewer pre-trained models available

Code Comparison

EasyOCR:

import easyocr
reader = easyocr.Reader(['en'])
result = reader.readtext('image.jpg')

PaddleOCR:

from paddleocr import PaddleOCR
ocr = PaddleOCR(use_angle_cls=True, lang='en')
result = ocr.ocr('image.jpg')

Both libraries offer simple APIs for OCR tasks, but PaddleOCR provides more options for fine-tuning and optimization. EasyOCR's code is more straightforward, making it easier for beginners to get started quickly. PaddleOCR's approach allows for more advanced configurations, which can be beneficial for complex OCR tasks or when performance optimization is crucial.

Text recognition (optical character recognition) with deep learning methods, ICCV 2019

Pros of deep-text-recognition-benchmark

  • Focuses specifically on text recognition, providing a comprehensive benchmark for various models
  • Implements multiple state-of-the-art architectures, allowing for easy comparison and experimentation
  • Offers a modular design, making it easier to swap components and test different combinations

Cons of deep-text-recognition-benchmark

  • Limited to text recognition, while PaddleOCR offers a more comprehensive OCR pipeline
  • Less extensive documentation and fewer pre-trained models compared to PaddleOCR
  • Smaller community and fewer updates, potentially leading to slower development and support

Code Comparison

deep-text-recognition-benchmark:

model = Model(opt)
converter = AttnLabelConverter(opt.character)
criterion = torch.nn.CrossEntropyLoss(ignore_index=0).to(device)

PaddleOCR:

model = build_model(config['Architecture'])
loss_class = build_loss(config['Loss'])
optimizer = build_optimizer(config['Optimizer'], model)

Both repositories use similar approaches for model initialization and loss function definition. However, PaddleOCR's code structure is more modular and configurable, allowing for easier customization of the OCR pipeline.

24,519

Mask R-CNN for object detection and instance segmentation on Keras and TensorFlow

Pros of Mask_RCNN

  • Specialized in instance segmentation, offering precise object detection and segmentation
  • Well-documented with extensive tutorials and examples
  • Supports both TensorFlow 1.x and 2.x

Cons of Mask_RCNN

  • Limited to object detection and segmentation tasks
  • Less frequent updates and maintenance compared to PaddleOCR
  • Steeper learning curve for beginners

Code Comparison

Mask_RCNN:

import mrcnn.model as modellib
from mrcnn import utils

class InferenceConfig(coco.CocoConfig):
    GPU_COUNT = 1
    IMAGES_PER_GPU = 1

model = modellib.MaskRCNN(mode="inference", config=InferenceConfig(), model_dir=MODEL_DIR)

PaddleOCR:

from paddleocr import PaddleOCR

ocr = PaddleOCR(use_angle_cls=True, lang='en')
result = ocr.ocr('image.jpg', cls=True)

The code snippets highlight the difference in focus between the two repositories. Mask_RCNN requires more setup for object detection and segmentation, while PaddleOCR offers a simpler interface for OCR tasks.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

English | 简体中文

Chat

简介

PaddleOCR 旨在打造一套丰富、领先、且实用的 OCR 工具库,助力开发者训练出更好的模型,并应用落地。

🚀 社区

PaddleOCR 由 PMC 监督。Issues 和 PRs 将在尽力的基础上进行审查。欲了解 PaddlePaddle 社区的完整概况,请访问 community。

⚠️注意:Issues模块仅用来报告程序🐞Bug,其余提问请移步Discussions模块提问。如所提Issue不是Bug,会被移到Discussions模块,敬请谅解。

📣 近期更新(more)

  • 🔥2024.7 添加 PaddleOCR 算法模型挑战赛冠军方案:

  • 💥2024.6.27 飞桨低代码开发工具 PaddleX 3.0 重磅更新!

    • 低代码开发范式:支持 OCR 模型全流程低代码开发,提供 Python API,支持用户自定义串联模型;
    • 多硬件训推支持:支持英伟达 GPU、昆仑芯、昇腾和寒武纪等多种硬件进行模型训练与推理。PaddleOCR支持的模型见 模型列表

📚文档

完整文档请移步:docs

🌟 特性

支持多种 OCR 相关前沿算法,在此基础上打造产业级特色模型PP-OCR、PP-Structure和PP-ChatOCRv2,并打通数据生产、模型训练、压缩、预测部署全流程。

⚡ 快速开始

📚《动手学 OCR》电子书

🎖 贡献者

⭐️ Star

Star History Chart

许可证书

本项目的发布受 Apache License Version 2.0 许可认证。