Convert Figma logo to code with AI

jorisschellekens logoborb

borb is a library for reading, creating and manipulating PDF files in python.

3,364
149
3,364
10

Top Related Projects

5,289

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Community maintained fork of pdfminer - we fathom PDF

iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText can be a boon to nearly every workflow.

2,152

A Python library for reading and writing PDF, powered by QPDF

Quick Overview

borb is a Python library for reading, creating, and manipulating PDF files. It offers a wide range of functionality, including creating PDFs from scratch, adding text and images, working with forms, and extracting content from existing PDFs. The library aims to provide a user-friendly interface for PDF manipulation tasks.

Pros

  • Comprehensive PDF manipulation capabilities
  • Easy-to-use API for common PDF operations
  • Supports both reading and writing PDF files
  • Actively maintained with regular updates

Cons

  • Limited documentation compared to some other PDF libraries
  • May have a steeper learning curve for complex operations
  • Performance might be slower for large PDF files compared to some alternatives
  • Relatively new project, so it may have fewer community resources and examples

Code Examples

Creating a simple PDF:

from borb.pdf import Document
from borb.pdf import Page
from borb.pdf import PDF
from borb.pdf import SingleColumnLayout
from borb.pdf import Paragraph

# Create document
pdf = Document()

# Add page
page = Page()
pdf.add_page(page)

# Add layout
layout = SingleColumnLayout(page)

# Add paragraph
layout.add(Paragraph("Hello World!"))

# Save PDF
with open("output.pdf", "wb") as pdf_file_handle:
    PDF.dumps(pdf_file_handle, pdf)

Adding an image to a PDF:

from borb.pdf import Document
from borb.pdf import Page
from borb.pdf import PDF
from borb.pdf import Image

# Create document
pdf = Document()

# Add page
page = Page()
pdf.add_page(page)

# Add image
page.add_image(
    Image(
        "path/to/image.jpg",
        width=300,
        height=200,
    ),
    x=50,
    y=500,
)

# Save PDF
with open("output.pdf", "wb") as pdf_file_handle:
    PDF.dumps(pdf_file_handle, pdf)

Extracting text from a PDF:

from borb.pdf import Document
from borb.toolkit import SimpleTextExtraction

# Read PDF
with open("input.pdf", "rb") as pdf_file_handle:
    pdf = Document.loads(pdf_file_handle)

# Extract text
extraction = SimpleTextExtraction()
extraction.extract(pdf)

# Print extracted text
print(extraction.get_text())

Getting Started

To get started with borb, first install it using pip:

pip install borb

Then, you can create a simple PDF using the following code:

from borb.pdf import Document, Page, PDF, SingleColumnLayout, Paragraph

pdf = Document()
page = Page()
pdf.add_page(page)
layout = SingleColumnLayout(page)
layout.add(Paragraph("Hello, borb!"))

with open("hello_borb.pdf", "wb") as pdf_file:
    PDF.dumps(pdf_file, pdf)

This will create a PDF file named "hello_borb.pdf" with the text "Hello, borb!" on it.

Competitor Comparisons

5,289

PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.

Pros of PyMuPDF

  • Faster performance for large PDF operations
  • More comprehensive PDF manipulation capabilities
  • Better support for complex PDF structures and annotations

Cons of PyMuPDF

  • Steeper learning curve due to more complex API
  • Less intuitive for simple PDF creation tasks
  • Requires external dependencies (MuPDF library)

Code Comparison

PyMuPDF example:

import fitz
doc = fitz.open("input.pdf")
page = doc[0]
text = page.get_text()
doc.close()

borb example:

from borb.pdf import Document
from borb.pdf.pdf import PDF

with open("input.pdf", "rb") as pdf_file_handle:
    doc = Document.load(pdf_file_handle)
    page = doc.get_page(0)
    text = page.get_text()

Both libraries offer PDF manipulation capabilities, but PyMuPDF generally provides more advanced features and better performance for complex operations. borb, on the other hand, offers a simpler API and is easier to get started with for basic PDF tasks. The choice between the two depends on the specific requirements of your project and the level of PDF manipulation needed.

Community maintained fork of pdfminer - we fathom PDF

Pros of pdfminer.six

  • More established project with longer history and larger community
  • Focused specifically on PDF extraction, potentially more specialized
  • Supports both Python 2 and Python 3

Cons of pdfminer.six

  • Less active development compared to borb
  • Limited to PDF extraction, while borb offers more comprehensive PDF manipulation

Code Comparison

pdfminer.six:

from pdfminer.high_level import extract_text

text = extract_text('sample.pdf')
print(text)

borb:

from borb.pdf import Document

doc = Document.load("sample.pdf")
page = doc.get_page(0)
print(page.get_text())

Both libraries offer straightforward methods for extracting text from PDFs, with borb providing a more object-oriented approach. pdfminer.six uses a high-level function, while borb works with document and page objects.

borb offers additional features for PDF creation and manipulation, making it more versatile for comprehensive PDF handling tasks. pdfminer.six, on the other hand, specializes in extraction and parsing, which may be preferable for projects focused solely on these aspects.

iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText can be a boon to nearly every workflow.

Pros of iText

  • More mature and widely adopted in enterprise environments
  • Extensive documentation and community support
  • Offers both open-source and commercial licensing options

Cons of iText

  • Commercial licensing can be expensive for some use cases
  • Learning curve can be steeper due to its comprehensive feature set
  • More complex API compared to Borb's simpler approach

Code Comparison

iText example:

PdfWriter writer = new PdfWriter(dest);
PdfDocument pdf = new PdfDocument(writer);
Document document = new Document(pdf);
document.add(new Paragraph("Hello World!"));
document.close();

Borb example:

pdf = Document()
page = Page()
pdf.add_page(page)
page.add(Paragraph("Hello World!"))
with open("output.pdf", "wb") as pdf_file_handle:
    PDF.dumps(pdf_file_handle, pdf)

Key Differences

  • Language: iText is primarily Java-based, while Borb is Python-based
  • API Design: iText offers a more comprehensive but complex API, whereas Borb aims for simplicity
  • Feature Set: iText provides a broader range of features, while Borb focuses on core PDF manipulation tasks
  • Community: iText has a larger user base and more extensive third-party resources

Both libraries offer powerful PDF manipulation capabilities, but they cater to different user needs and preferences. The choice between them depends on factors such as programming language preference, project requirements, and licensing considerations.

2,152

A Python library for reading and writing PDF, powered by QPDF

Pros of pikepdf

  • More focused on low-level PDF manipulation and parsing
  • Faster performance for certain operations due to C++ core
  • Extensive documentation and examples available

Cons of pikepdf

  • Less user-friendly for high-level PDF creation tasks
  • Requires more code for common operations like adding text or images
  • Steeper learning curve for beginners

Code Comparison

pikepdf example (adding text to a PDF):

import pikepdf
from pikepdf import Pdf, Page, Rectangle

pdf = Pdf.new()
page = pdf.add_blank_page()
font = pdf.add_resource(pikepdf.Name.Font, pikepdf.Name.F1, pikepdf.Dictionary(
    Type=pikepdf.Name.Font,
    Subtype=pikepdf.Name.Type1,
    BaseFont=pikepdf.Name.Helvetica
))
page.Contents = pikepdf.Stream(pdf, b"BT /F1 12 Tf 72 720 Td (Hello World) Tj ET")
pdf.save("output.pdf")

borb example (adding text to a PDF):

from borb.pdf import Document, Page, Paragraph, PDF

doc = Document()
page = Page()
doc.add_page(page)
page.add(Paragraph("Hello World"))
with open("output.pdf", "wb") as pdf_file_handle:
    PDF.dumps(pdf_file_handle, doc)

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

borb logo borb

Code style: black Corpus Coverage : 100.0% Text Extraction : 93.1% Public Method Documentation : 100% Number of Tests : 760 Python : 3.8 | 3.9 | 3.10 Type Checking : 98%

Downloads Downloads

borb is a library for creating and manipulating PDF files in python.

0. About borb

borb is a pure python library to read, write and manipulate PDF documents. It represents a PDF document as a JSON-like datastructure of nested lists, dictionaries and primitives (numbers, string, booleans, etc)

This is currently a one-man project, so the focus will always be to support those use-cases that are more common in favor of those that are rare.

1. About the Examples

The examples can be found in a separate repository. This ensures the borb repository stays relatively small, whilst still providing a thorough knowledgebase of code-samples, screenshots and explanatory text.

Check out the examples repository here!

They include;

  • Reading a PDF and extracting meta-information
  • Changing meta-information
  • Extracting text from a PDF
  • Extracting images from a PDF
  • Changing images in a PDF
  • Adding annotations (notes, links, etc) to a PDF
  • Adding text to a PDF
  • Adding tables to a PDF
  • Adding lists to a PDF
  • Using a PageLayout manager

and much more

1.0 Installing borb

borb can be installed using pip

pip install borb

If you have installed borb before, and you want to ensure pip downloads the latest version (rather than using its internal cache) you can use the following commands:

pip uninstall borb
pip install --no-cache borb

1.1 Hello World

To give you an immediate idea of the way borb works, this is the classic Hello World example, in borb:

from pathlib import Path

from borb.pdf import Document
from borb.pdf import Page
from borb.pdf import SingleColumnLayout
from borb.pdf import Paragraph
from borb.pdf import PDF

# create an empty Document
pdf = Document()

# add an empty Page
page = Page()
pdf.add_page(page)

# use a PageLayout (SingleColumnLayout in this case)
layout = SingleColumnLayout(page)

# add a Paragraph object
layout.add(Paragraph("Hello World!"))
    
# store the PDF
with open(Path("output.pdf"), "wb") as pdf_file_handle:
    PDF.dumps(pdf_file_handle, pdf)

2. License

borb is dual licensed as AGPL/Commercial software.

AGPL is a free / open source software license. This doesn't mean the software is gratis!

Buying a license is mandatory as soon as you develop commercial activities distributing the borb software inside your product or deploying it on a network without disclosing the source code of your own applications under the AGPL license. These activities include:

  • Offering paid services to customers as an ASP
  • Serving PDFs on the fly in the cloud or in a web application
  • Shipping borb with a closed source product

Contact sales for more information.

3. Acknowledgements

I would like to thank the following people, for their contributions / advice with regards to developing borb:

  • Aleksander Banasik
  • Benoît Lagae
  • Michael Klink