Top Related Projects
A Python library for reading and writing PDF, powered by QPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText can be a boon to nearly every workflow.
Mirror of Apache PDFBox
PDF Reader in JavaScript
PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
Quick Overview
QPDF is a command-line program and C++ library for performing structural, content-preserving transformations on PDF files. It offers capabilities for linearizing (web-optimizing) PDFs, encrypting and decrypting, and inspecting or modifying the structure of PDF files.
Pros
- Powerful and versatile PDF manipulation tool
- Supports both command-line usage and integration as a C++ library
- Actively maintained with regular updates and improvements
- Cross-platform compatibility (Windows, macOS, Linux)
Cons
- Steep learning curve for advanced operations
- Limited GUI options, primarily command-line focused
- May require additional tools for complex PDF content manipulation
- Documentation can be overwhelming for beginners
Code Examples
- Reading a PDF file:
#include <qpdf/QPDF.hh>
#include <qpdf/QPDFWriter.hh>
QPDF pdf;
pdf.processFile("input.pdf");
- Encrypting a PDF:
QPDFWriter w(pdf);
w.setOutputFilename("encrypted.pdf");
w.setPassword("", "secret-password");
w.write();
- Merging two PDF files:
QPDF pdf1, pdf2;
pdf1.processFile("file1.pdf");
pdf2.processFile("file2.pdf");
pdf1.addPage(pdf2.getAllPages()[0], false);
QPDFWriter w(pdf1);
w.setOutputFilename("merged.pdf");
w.write();
Getting Started
To use QPDF in your C++ project:
- Install QPDF using your package manager or build from source.
- Include the necessary headers in your C++ file:
#include <qpdf/QPDF.hh>
#include <qpdf/QPDFWriter.hh>
- Link against the QPDF library when compiling:
g++ -o your_program your_program.cpp -lqpdf
- Use QPDF functions in your code as shown in the examples above.
For command-line usage, simply install QPDF and use it from the terminal:
qpdf --encrypt user-password owner-password 40 -- input.pdf output.pdf
This example encrypts input.pdf
with user and owner passwords, using 40-bit encryption, and saves the result as output.pdf
.
Competitor Comparisons
A Python library for reading and writing PDF, powered by QPDF
Pros of pikepdf
- Higher-level Python API, making PDF manipulation more accessible to Python developers
- Faster performance for certain operations due to its Cython implementation
- More Pythonic approach to PDF manipulation, with object-oriented design
Cons of pikepdf
- Limited to Python ecosystem, whereas QPDF is usable in multiple languages
- May have a steeper learning curve for those familiar with QPDF's command-line interface
- Potentially less flexible for low-level PDF operations compared to QPDF
Code Comparison
QPDF (C++):
QPDFObjectHandle root = pdf.getRoot();
QPDFObjectHandle pages = root.getKey("/Pages");
QPDFObjectHandle kids = pages.getKey("/Kids");
pikepdf (Python):
with pikepdf.Pdf.open('input.pdf') as pdf:
root = pdf.Root
pages = root.Pages
kids = pages.Kids
Both libraries provide similar functionality for PDF manipulation, but pikepdf offers a more Pythonic interface. QPDF, being a C++ library, provides broader language support and potentially more low-level control. pikepdf, built on top of QPDF, offers a higher-level API specifically tailored for Python developers, making it easier to work with PDFs in Python projects.
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
Pros of PyMuPDF
- Python-centric, offering a high-level API for PDF manipulation
- Extensive feature set including text extraction, rendering, and annotation
- Faster performance for certain operations due to its C core
Cons of PyMuPDF
- Limited to Python ecosystem, less versatile for cross-language integration
- May have a steeper learning curve for users not familiar with Python
- Larger file size and more dependencies compared to QPDF
Code Comparison
PyMuPDF example:
import fitz
doc = fitz.open("input.pdf")
page = doc[0]
text = page.get_text()
doc.save("output.pdf")
QPDF example (using command-line interface):
qpdf --decrypt input.pdf --pages . 1 -- output.pdf
Note: QPDF is primarily a command-line tool and C++ library, so direct code comparison is challenging. The above example shows a basic operation in both libraries.
PyMuPDF offers more intuitive Python-based manipulation, while QPDF provides a lightweight, command-line focused approach for PDF operations. Choose based on your preferred language and specific project requirements.
iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText can be a boon to nearly every workflow.
Pros of iText
- More comprehensive PDF manipulation capabilities, including creation, editing, and digital signatures
- Extensive documentation and community support
- Cross-platform compatibility (Java and .NET versions available)
Cons of iText
- Commercial licensing required for many use cases
- Steeper learning curve due to more complex API
- Larger library size and potential performance overhead
Code Comparison
iText example (creating a simple PDF):
Document document = new Document();
PdfWriter.getInstance(document, new FileOutputStream("example.pdf"));
document.open();
document.add(new Paragraph("Hello, World!"));
document.close();
QPDF example (modifying an existing PDF):
QPDF pdf;
pdf.processFile("input.pdf");
QPDFObjectHandle page = pdf.getPage(0);
page.addContentStreamAfter(page.getLastContentStream(),
"BT /F1 24 Tf 72 720 Td (Hello, World!) Tj ET\n");
pdf.writeToFile("output.pdf");
Summary
iText offers more comprehensive PDF manipulation features but comes with commercial licensing requirements and a steeper learning curve. QPDF is open-source and focuses on lower-level PDF operations, making it lighter and potentially faster for specific tasks. The choice between the two depends on the project requirements, budget constraints, and desired level of PDF manipulation capabilities.
Mirror of Apache PDFBox
Pros of PDFBox
- More comprehensive PDF manipulation capabilities, including text extraction and content creation
- Larger and more active community, potentially leading to better support and more frequent updates
- Better documentation and more extensive examples for various use cases
Cons of PDFBox
- Larger library size, which may impact application size and performance
- Steeper learning curve due to more complex API and wider range of features
- May be overkill for simple PDF operations, where QPDF could be more efficient
Code Comparison
QPDF (C++):
QPDFWriter w(pdf);
w.setOutputFilename(outfilename);
w.write();
PDFBox (Java):
PDDocument document = PDDocument.load(new File("input.pdf"));
document.save("output.pdf");
document.close();
Both examples demonstrate basic PDF loading and saving operations. QPDF's approach is more concise, while PDFBox offers a similar functionality with a slightly different syntax. PDFBox's Java-based implementation may be more familiar to some developers, whereas QPDF's C++ code might appeal to those working in lower-level environments or seeking potentially faster performance.
PDF Reader in JavaScript
Pros of pdf.js
- Written in JavaScript, making it easily integrable into web applications
- Renders PDFs directly in the browser without plugins
- Supports a wide range of PDF features and interactive elements
Cons of pdf.js
- Limited PDF manipulation capabilities compared to QPDF
- May have performance issues with large or complex PDF files
- Requires a JavaScript runtime environment
Code Comparison
QPDF (C++):
QPDFObjectHandle root = pdf.getRoot();
QPDFObjectHandle pages = root.getKey("/Pages");
QPDFObjectHandle kids = pages.getKey("/Kids");
pdf.js (JavaScript):
PDFJS.getDocument(url).then(function(pdf) {
pdf.getPage(1).then(function(page) {
var scale = 1.5;
var viewport = page.getViewport(scale);
});
});
QPDF focuses on low-level PDF manipulation and is written in C++, making it suitable for desktop applications and server-side processing. It offers robust PDF transformation capabilities.
pdf.js is designed for rendering PDFs in web browsers, providing a JavaScript API for PDF viewing and basic interactions. It excels in client-side PDF display but has limited editing features compared to QPDF.
Both projects serve different purposes: QPDF for PDF manipulation and processing, pdf.js for in-browser PDF rendering and viewing.
PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
Pros of pdfparser
- Focused on parsing and extracting data from PDF files
- Written in PHP, making it easy to integrate with web applications
- Provides high-level methods for text and metadata extraction
Cons of pdfparser
- Limited functionality compared to qpdf's comprehensive PDF manipulation capabilities
- May struggle with complex or malformed PDF structures
- Less actively maintained, with fewer recent updates
Code Comparison
pdfparser:
$parser = new \Smalot\PdfParser\Parser();
$pdf = $parser->parseFile('document.pdf');
$text = $pdf->getText();
qpdf:
QPDF pdf;
pdf.processFile("input.pdf");
QPDFObjectHandle root = pdf.getRoot();
QPDFObjectHandle pages = root.getKey("/Pages");
Summary
pdfparser is a PHP library focused on extracting data from PDF files, making it suitable for web applications requiring basic PDF parsing. qpdf, on the other hand, is a more comprehensive C++ library for PDF manipulation, offering a wider range of features beyond parsing. While pdfparser provides simpler, high-level methods for text extraction, qpdf offers more low-level control and advanced PDF operations. The choice between the two depends on the specific requirements of the project and the preferred programming language.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
qpdf is a command-line tool and C++ library that performs content-preserving transformations on PDF files. It supports linearization, encryption, and numerous other features. It can also be used for splitting and merging files, creating PDF files (but you have to supply all the content yourself), and inspecting files for study or analysis. qpdf does not render PDFs or perform text extraction, and it does not contain higher-level interfaces for working with page contents. It is a low-level tool for working with the structure of PDF files and can be a valuable tool for anyone who wants to do programmatic or command-line-based manipulation of PDF files.
The qpdf Manual is hosted online at https://qpdf.readthedocs.io. The project website is https://qpdf.sourceforge.io. The source code repository is hosted at GitHub: https://github.com/qpdf/qpdf.
Verifying Distributions
The public key used to sign qpdf source distributions has
fingerprint C2C9 6B10 011F E009 E6D1 DF82 8A75 D109 9801 2C7E
and can be found at https://q.ql.org/pubkey.asc or
downloaded from a public key server.
Copyright, License
qpdf is copyright (c) 2005-2024 Jay Berkenbilt
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
https://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an " AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
You may also see the license in the file LICENSE.txt in the source distribution.
Versions of qpdf prior to version 7 were released under the terms of version 2.0 of the Artistic License. At your option, you may continue to consider qpdf to be licensed under those terms. Please see the manual for additional information. The Artistic License appears in the file Artistic-2.0 in the source distribution.
Prerequisites
qpdf requires a C++ compiler that supports C++-17.
To compile and link something with qpdf, you can use pkg-config
with package name libqpdf
or cmake
with package
name qpdf
. Here's an example of a CMakeLists.txt
file that builds a program with the qpdf library:
cmake_minimum_required(VERSION 3.16)
project(some-application LANGUAGES CXX)
find_package(qpdf)
add_executable(some-application some-application.cc)
target_link_libraries(some-application qpdf::libqpdf)
qpdf depends on the external libraries zlib and jpeg. The libjpeg-turbo library is also known to work since it is compatible with the regular jpeg library, and qpdf doesn't use any interfaces that aren't present in the straight jpeg8 API. These are part of every Linux distribution and are readily available. Download information appears in the documentation. For Windows, you can download pre-built binary versions of these libraries for some compilers; see README-windows.md for additional details.
Depending on which crypto providers are enabled, then GnuTLS and OpenSSL may also be required. This is discussed more in Crypto providers below.
Detailed information appears in the manual.
Licensing terms of embedded software
qpdf makes use of zlib and jpeg libraries for its functionality. These packages can be downloaded separately from their own download locations. If the optional GnuTLS or OpenSSL crypto providers are enabled, then GnuTLS and/or OpenSSL are also required.
Please see the NOTICE file for information on licenses of embedded software.
Crypto providers
qpdf can use different crypto implementations. These can be selected at compile time or at runtime. The native crypto implementations that were used in all versions prior to 9.1.0 are still present, but they are not built into qpdf by default if any external providers are available at build time.
The following providers are available:
gnutls
: an implementation that uses the GnuTLS library to provide crypto; causes libqpdf to link with the GnuTLS libraryopenssl
: an implementation that can use the OpenSSL (or BoringSSL) libraries to provide crypto; causes libqpdf to link with the OpenSSL librarynative
: a native implementation where all the source is embedded in qpdf and no external dependencies are required
The default behavior is for cmake to discover which other crypto providers can be supported based on available external libraries, to build all available external crypto providers, and to use an external provider as the default over the native one. By default, the native crypto provider will be used only if no external providers are available. This behavior can be changed with various cmake options as described in the manual.
Note about weak cryptographic algorithms
The PDF file format used to rely on RC4 for encryption. Using 256-bit keys always uses AES instead, and with 128-bit keys, you can elect to use AES. qpdf does its best to warn when someone is writing a file with weak cryptographic algorithms, but qpdf must always retain support for being able to read and even write files with weak encryption to be able to fully support older PDF files and older PDF readers.
Building from source distribution on UNIX/Linux
Starting with version 11, qpdf builds with cmake. The default configuration with cmake works on most systems. On Windows, you can build qpdf with Visual Studio using cmake without having any additional tools installed. However, to run the test suite, you need MSYS2, and you also need MSYS2 to build with mingw.
Example UNIX/Linux build:
cmake -S . -B build -DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake --build build
Example mingw build from an MSYS2 mingw shell:
cmake -S . -B build -G 'MSYS Makefiles' -DCMAKE_BUILD_TYPE=RelWithDebInfo
cmake --build build
Example MSVC build from an MSYS shell or from a Windows command shell with Visual Studio command-line tools in the path:
cmake -S . -B build
cmake --build build --config Release
Installation can be done with cmake --install
. Packages can be made with cpack
.
The tests use qtest
, and the test driver is invoked by ctest
. To see the real underlying tests,
run ctest --verbose
so that you can see qtest
's output. If you need to turn off qtest's color output,
pass -DQTEST_COLOR=0
to cmake.
For additional information, please refer to the manual.
Building on Windows
qpdf is known to build and pass its test suite with mingw and Microsoft Visual C++. Both 32-bit and 64-bit versions work. In addition to the manual, see README-windows.md for more details on how to build under Windows.
Building Documentation
The qpdf manual is written in reStructured Text format and is build with sphinx. The
sources to the user manual can be found in the manual
directory. For more detailed information, consult
the Building and Installing qpdf section of the manual or
consult the build-doc script.
Additional Notes on Build
qpdf provides cmake configuration files and pkg-config files. They support static and dynamic linking. In general, you
do not need the header files from qpdf's dependencies to be available to builds that use qpdf. The only exception to
this is that, if you include Pl_DCT.hh
, you need header files from libjpeg
. Since this is a rare case, qpdf's cmake
and pkg-config files do not automatically add a JPEG include path to the build. If you are using Pl_DCT
explicitly,
you probably already have that configured in your build.
To learn about using the library, please read comments in the header files in include/qpdf, especially QPDF.hh, QPDFObjectHandle.hh, and QPDFWriter.hh. These are the best sources of documentation on the API. You can also study the code of QPDFJob.cc, which exercises most of the public interface. There are additional example programs in the examples directory.
Additional Notes on Test Suite
By default, slow tests and tests that require dependencies beyond those needed to build qpdf are disabled. Slow tests
include image comparison tests and large file tests. Image comparison tests can be enabled by setting
the QPDF_TEST_COMPARE_IMAGES
environment variable to 1
. Large file tests can be enabled setting
the QPDF_LARGE_FILE_TEST_PATH
environment variable to the absolute path of a directory with at least 11 GB of free
space that can handle files over 4 GB in size. On Windows, this should be a Windows path (e.g. C:\LargeFileTemp
even
if the build is being run from an MSYS2 environment. The test suite provides nearly full coverage even without these
tests. Unless you are making deep changes to the library that would impact the contents of the generated PDF files or
testing this on a new platform for the first time, there is no real reason to run these tests. If you're just running
the test suite to make sure that qpdf works for your build, the default tests are adequate.
If you are packaging qpdf for a distribution and preparing a build that is run by an autobuilder, you may want to
pass -DSHOW_FAILED_TEST_OUTPUT=1
to cmake
and run ctest
with the --verbose
or --output-on-failure
option. This
way, if the test suite fails, test failure detail will be included in the build output. Otherwise, you will have to have
access to the qtest.log
file from the build to view test failures. The Debian packages for qpdf enable this option.
More notes for packagers can be found in the manual.
Random Number Generation
By default, qpdf uses the crypto provider for generating random numbers. The rest of this applies only if you are using the native crypto provider.
If the native crypto provider is in use, then, when qpdf
detects either the Windows cryptography API or the existence
of /dev/urandom
, /dev/arandom
, or /dev/random
, it uses them to generate cryptographically secure random numbers.
If none of these conditions are true, the build will fail with an error. This behavior can be modified in several ways:
- If you use the cmake option
SKIP_OS_SECURE_RANDOM
or define theSKIP_OS_SECURE_RANDOM
preprocessor symbol, qpdf will not attempt to use Windows cryptography or the random device. You must either supply your own random data provider or allow use of insecure random numbers. - If you turn on the cmake option
USE_INSECURE_RANDOM
or define theUSE_INSECURE_RANDOM
preprocessor symbol, qpdf will try insecure random numbers if OS-provided secure random numbers are disabled. This is not a fallback. In order for insecure random numbers to be used, you must also disable OS secure random numbers since, otherwise, failure to find OS secure random numbers is a compile error. The insecure random number source is stdlib'srandom()
orrand()
calls. These random numbers are not cryptography secure, but the qpdf library is fully functional using them. Using non-secure random numbers means that it's easier in some cases to guess encryption keys. - In all cases, you may supply your own random data provider. To do this, derive a class
from
qpdf/RandomDataProvider
(since version 5.1.0) and callQUtil::setRandomDataProvider
before you create anyQPDF
objects. If you supply your own random data provider, it will always be used even if support for one of the other random data providers is compiled in. If you wish to avoid any possibility of your build of qpdf from using anything but a user-supplied random data provider, you can defineSKIP_OS_SECURE_RANDOM
and notUSE_INSECURE_RANDOM
. In this case, qpdf will throw a runtime error if any attempt is made to generate random numbers and no random data provider has been supplied.
Acknowledgments
The qpdf project has a JetBrains license through their Open Source Program. We are grateful for this program and have been enjoying the benefits of their high-quality products.
Top Related Projects
A Python library for reading and writing PDF, powered by QPDF
PyMuPDF is a high performance Python library for data extraction, analysis, conversion & manipulation of PDF (and other) documents.
iText for Java represents the next level of SDKs for developers that want to take advantage of the benefits PDF can bring. Equipped with a better document engine, high and low-level programming capabilities and the ability to create, edit and enhance PDF documents, iText can be a boon to nearly every workflow.
Mirror of Apache PDFBox
PDF Reader in JavaScript
PdfParser, a standalone PHP library, provides various tools to extract data from a PDF file.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot