Top Related Projects
qpdf: A content-preserving PDF document transformer
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText can be a boon to nearly every workflow.
Mirror of Apache PDFBox
Quick Overview
pikepdf is a Python library for reading, writing, and manipulating PDF files. It provides a Pythonic interface to the powerful QPDF C++ library, allowing developers to work with PDF documents efficiently and with fine-grained control.
Pros
- High performance due to its C++ backend
- Comprehensive PDF manipulation capabilities
- Supports both reading and writing PDF files
- Actively maintained and well-documented
Cons
- Steeper learning curve compared to simpler PDF libraries
- Requires compilation of C++ extensions, which may be challenging on some systems
- Limited support for creating PDFs from scratch (primarily focused on manipulation)
Code Examples
- Opening a PDF and extracting text:
import pikepdf
pdf = pikepdf.Pdf.open("input.pdf")
for page in pdf.pages:
print(page.extract_text())
- Merging multiple PDFs:
import pikepdf
output = pikepdf.Pdf.new()
for file in ["file1.pdf", "file2.pdf", "file3.pdf"]:
src = pikepdf.Pdf.open(file)
output.pages.extend(src.pages)
output.save("merged.pdf")
- Removing a specific page from a PDF:
import pikepdf
with pikepdf.Pdf.open("input.pdf") as pdf:
del pdf.pages[1] # Remove the second page (0-indexed)
pdf.save("output.pdf")
Getting Started
To get started with pikepdf, first install it using pip:
pip install pikepdf
Then, you can use it in your Python scripts:
import pikepdf
# Open a PDF file
pdf = pikepdf.Pdf.open("input.pdf")
# Perform operations on the PDF
# For example, rotate the first page by 90 degrees
pdf.pages[0].rotate(90)
# Save the modified PDF
pdf.save("output.pdf")
This example demonstrates how to open a PDF, rotate the first page, and save the modified document. pikepdf offers many more advanced features for working with PDFs, which you can explore in the official documentation.
Competitor Comparisons
qpdf: A content-preserving PDF document transformer
Pros of qpdf
- Written in C++, offering potentially better performance for low-level PDF operations
- Provides a command-line interface for quick PDF manipulations
- Longer development history, potentially more stable and feature-complete
Cons of qpdf
- Less Pythonic API, which may be less intuitive for Python developers
- Requires separate installation and management of C++ dependencies
- May have a steeper learning curve for those primarily working in Python
Code Comparison
qpdf (C++):
QPDF pdf;
pdf.processFile("input.pdf");
QPDFWriter w(pdf, "output.pdf");
w.write();
pikepdf (Python):
with pikepdf.Pdf.open("input.pdf") as pdf:
pdf.save("output.pdf")
pikepdf provides a more Pythonic and concise API for PDF manipulation, while qpdf offers lower-level control and potentially better performance for complex operations. pikepdf is built on top of qpdf, leveraging its core functionality while providing a more accessible interface for Python developers. The choice between the two depends on the specific requirements of the project, programming language preference, and the level of PDF manipulation needed.
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Pros of PyMuPDF
- Broader functionality, including support for various document formats beyond PDF
- More comprehensive documentation and examples
- Faster rendering and processing for certain operations
Cons of PyMuPDF
- Larger library size and potentially higher memory usage
- More complex API, which may have a steeper learning curve
- Licensing restrictions (GNU GPL) may limit commercial use
Code Comparison
PyMuPDF:
import fitz
doc = fitz.open("example.pdf")
page = doc[0]
text = page.get_text()
pikepdf:
from pikepdf import Pdf
pdf = Pdf.open("example.pdf")
page = pdf.pages[0]
text = page.extract_text()
Both libraries offer similar basic functionality for opening and extracting text from PDF files. PyMuPDF uses the fitz
module, while pikepdf uses its own namespace. PyMuPDF's API is slightly more concise in this example, but pikepdf's approach may be more intuitive for some users.
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
Pros of PyPDF
- Pure Python implementation, making it easier to install and use across different platforms
- More extensive documentation and examples for various PDF operations
- Larger community and longer history, potentially leading to better support and resources
Cons of PyPDF
- Generally slower performance compared to PikePDF, especially for large PDF files
- Less robust handling of complex PDF structures and features
- Limited support for some advanced PDF operations and modifications
Code Comparison
PyPDF:
from pypdf import PdfReader, PdfWriter
reader = PdfReader("input.pdf")
writer = PdfWriter()
for page in reader.pages:
writer.add_page(page)
writer.write("output.pdf")
PikePDF:
from pikepdf import Pdf
with Pdf.open("input.pdf") as pdf:
pdf.save("output.pdf")
PikePDF's code is more concise for simple operations like copying a PDF. However, both libraries offer more complex functionality for specific tasks, and the code complexity may vary depending on the operation.
iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText can be a boon to nearly every workflow.
Pros of iText
- More comprehensive PDF manipulation capabilities, including creation, editing, and digital signatures
- Extensive documentation and commercial support options
- Wider language support, including Java, .NET, and Android
Cons of iText
- Commercial licensing required for many use cases
- Steeper learning curve due to more complex API
- Larger library size and potential performance overhead
Code Comparison
iText (Java):
PdfDocument pdf = new PdfDocument(new PdfWriter(dest));
Document document = new Document(pdf);
document.add(new Paragraph("Hello World!"));
document.close();
pikepdf (Python):
pdf = pikepdf.Pdf.new()
page = pdf.add_blank_page()
page.add_text("Hello World!")
pdf.save("output.pdf")
Summary
iText is a more feature-rich PDF library with broader language support and commercial backing, while pikepdf is a lightweight, Python-focused alternative. iText offers more advanced capabilities but comes with licensing considerations and a steeper learning curve. pikepdf provides a simpler API for basic PDF operations, making it easier to use for straightforward tasks. The choice between the two depends on the specific project requirements, language preferences, and licensing considerations.
Mirror of Apache PDFBox
Pros of PDFBox
- Written in Java, offering better platform independence and integration with Java ecosystems
- More comprehensive feature set for PDF manipulation, including digital signatures and form filling
- Larger community and longer development history, potentially leading to better stability and support
Cons of PDFBox
- Generally slower performance compared to pikepdf, especially for large PDF files
- More complex API, which can lead to steeper learning curve and longer development time
- Larger memory footprint, which may be a concern for resource-constrained environments
Code Comparison
PDFBox:
PDDocument document = PDDocument.load(new File("input.pdf"));
PDFTextStripper stripper = new PDFTextStripper();
String text = stripper.getText(document);
document.close();
pikepdf:
pdf = pikepdf.Pdf.open("input.pdf")
text = ""
for page in pdf.pages:
text += page.get_contents().read_bytes().decode()
Both examples demonstrate opening a PDF file and extracting text content. PDFBox uses a more object-oriented approach with separate classes for document and text extraction, while pikepdf offers a more concise, Pythonic syntax.
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
pikepdf
pikepdf is a Python library for reading and writing PDF files.
pikepdf is based on qpdf, a powerful PDF manipulation and repair library.
Python + qpdf = "py" + "qpdf" = "pyqpdf", which looks like a dyslexia test. Say it out loud, and it sounds like "pikepdf".
# Elegant, Pythonic API
with pikepdf.open('input.pdf') as pdf:
num_pages = len(pdf.pages)
del pdf.pages[-1]
pdf.save('output.pdf')
To install:
pip install pikepdf
For users who want to build from source, see installation.
pikepdf is documented and actively maintained. Binary wheels are available for all common platforms, both x86-64 and ARM64/Apple Silicon. For information on the latest changes, see the release notes.
Commercial support is available.
Features
This library is similar to pypdf (formerly PyPDF2) - it provides low level access to PDF features and allows editing and content transformation of existing PDFs. Some knowledge of the PDF specification may be helpful. It does not have the capability to render a PDF to image.
Feature | pikepdf | pypdf (PyPDF2) |
---|---|---|
Editing, manipulation and transformation of existing PDFs | â | â |
Based on an existing, mature PDF library | qpdf | â |
Implementation | C++ and Python | Python |
PDF versions supported | 1.1 to 1.7 | 1.1 to 1.7 |
Save and load password protected (encrypted) PDFs | â (except public key) | â (except public key) |
Creates linearized ("fast web view") PDFs | â | â |
Test suite coverage | ||
Creates PDFs that pass PDF validation tests | â | â |
Modifies PDF/A without breaking PDF/A compliance | â | â |
PDF XMP metadata editing | â | read-only |
Integrates with Jupyter and IPython notebooks for rapid development | â | â |
Testimonials
I decided to try writing a quick Python program with pikepdf to automate [something] and it "just worked". âJay Berkenbilt, creator of qpdf
"Thanks for creating a great pdf library, I tested out several and this is the one that was best able to work with whatever I threw at it." â@cfcurtis
In Production
-
OCRmyPDF uses pikepdf to graft OCR text layers onto existing PDFs, to examine the contents of input PDFs, and to optimize PDFs.
-
PDF Arranger is a small Python application that provides a graphical user interface to rotate, crop and rearrange PDFs.
-
PDFStitcher is a utility for stitching PDF pages into a single document (i.e. N-up or page imposition).
License
pikepdf is licensed under the Mozilla Public License 2.0 license (MPL-2.0) that can be found in the LICENSE file. By using, distributing, or contributing to this project, you agree to the terms and conditions of this license. MPL 2.0 permits you to combine the software with other work, including commercial and closed source software, but asks you to publish source-level modifications you make to pikepdf itself.
Some components of the project may be under other license agreements, as indicated in their SPDX license header or the REUSE.toml
file.
Top Related Projects
qpdf: A content-preserving PDF document transformer
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
A pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files
iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText can be a boon to nearly every workflow.
Mirror of Apache PDFBox
Convert
designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot