docx4j
JAXB-based Java library for Word docx, Powerpoint pptx, and Excel xlsx files
Top Related Projects
Mirror of Apache POI
A pure PHP library for reading and writing word processing documents
Open XML SDK by Microsoft
Quick Overview
Docx4j is a Java library for creating, editing, and manipulating Microsoft Word documents (.docx files). It provides a powerful API for working with the Office Open XML format, allowing developers to programmatically generate and modify Word documents without requiring Microsoft Office to be installed.
Pros
- Extensive functionality for working with Word documents, including text manipulation, styling, and document structure
- Pure Java implementation, making it platform-independent and easy to integrate into existing Java projects
- Active development and community support, with regular updates and bug fixes
- Supports both reading and writing of DOCX files, as well as conversion to other formats like PDF
Cons
- Steep learning curve due to the complexity of the Office Open XML format
- Large library size, which may impact application size and performance
- Limited support for some advanced Word features compared to Microsoft's official libraries
- Documentation can be sparse or outdated for some less common use cases
Code Examples
Creating a new Word document:
WordprocessingMLPackage wordPackage = WordprocessingMLPackage.createPackage();
MainDocumentPart mainDocumentPart = wordPackage.getMainDocumentPart();
mainDocumentPart.addParagraphOfText("Hello, World!");
wordPackage.save(new File("HelloWorld.docx"));
Adding a table to a document:
int writableWidthTwips = wordPackage.getDocumentModel().getSections().get(0).getPageDimensions().getWritableWidthTwips();
int cols = 3;
int cellWidthTwips = writableWidthTwips/cols;
Tbl tbl = TblFactory.createTable(3, 3, cellWidthTwips);
mainDocumentPart.addObject(tbl);
Applying styles to text:
P paragraph = mainDocumentPart.addParagraphOfText("Styled text");
paragraph.setPPr(new PPr());
paragraph.getPPr().setJc(new Jc());
paragraph.getPPr().getJc().setVal(JcEnumeration.CENTER);
R run = (R)paragraph.getContent().get(0);
run.setRPr(new RPr());
run.getRPr().setB(new BooleanDefaultTrue());
run.getRPr().setI(new BooleanDefaultTrue());
Getting Started
To use docx4j in your Java project, add the following dependency to your Maven pom.xml
file:
<dependency>
<groupId>org.docx4j</groupId>
<artifactId>docx4j-JAXB-MOXy</artifactId>
<version>11.4.9</version>
</dependency>
For Gradle, add this to your build.gradle
file:
implementation 'org.docx4j:docx4j-JAXB-MOXy:11.4.9'
Then, you can start using docx4j in your Java code by importing the necessary classes:
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart;
Competitor Comparisons
Mirror of Apache POI
Pros of POI
- Broader support for Microsoft Office formats (Excel, PowerPoint, Visio, etc.)
- More mature project with a larger community and extensive documentation
- Apache license, which may be preferable for some organizations
Cons of POI
- Larger library size, potentially impacting application size and performance
- Steeper learning curve due to its comprehensive feature set
- Less focused on OOXML formats compared to docx4j
Code Comparison
docx4j example:
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
mainDocumentPart.addParagraphOfText("Hello World!");
wordMLPackage.save(new File("HelloWorld.docx"));
POI example:
XWPFDocument document = new XWPFDocument();
XWPFParagraph paragraph = document.createParagraph();
XWPFRun run = paragraph.createRun();
run.setText("Hello World!");
FileOutputStream out = new FileOutputStream("HelloWorld.docx");
document.write(out);
out.close();
Both libraries offer similar functionality for creating and manipulating Office documents, but POI provides a wider range of supported formats at the cost of increased complexity and size. docx4j focuses more on OOXML formats and may be easier to use for specific Word document tasks.
A pure PHP library for reading and writing word processing documents
Pros of PHPWord
- Written in PHP, making it more accessible for PHP developers
- Extensive documentation and examples available
- Supports reading and writing various document formats (OOXML, RTF, HTML)
Cons of PHPWord
- Limited support for complex document structures
- Performance may be slower for large documents
- Less mature compared to docx4j
Code Comparison
PHPWord:
$phpWord = new \PhpOffice\PhpWord\PhpWord();
$section = $phpWord->addSection();
$section->addText('Hello World!');
$objWriter = \PhpOffice\PhpWord\IOFactory::createWriter($phpWord, 'Word2007');
$objWriter->save('helloWorld.docx');
docx4j:
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
mainDocumentPart.addParagraphOfText("Hello World!");
Docx4J.save(wordMLPackage, new File("helloWorld.docx"));
Both libraries provide similar functionality for creating simple documents, but docx4j offers more advanced features for complex document manipulation. PHPWord is more suitable for PHP-based projects, while docx4j is better for Java environments and large-scale document processing.
Open XML SDK by Microsoft
Pros of Open-XML-SDK
- Native .NET implementation, offering seamless integration with C# and other .NET languages
- Extensive documentation and strong community support within the Microsoft ecosystem
- Optimized performance for Windows environments
Cons of Open-XML-SDK
- Limited cross-platform compatibility compared to Java-based docx4j
- Steeper learning curve for developers not familiar with the .NET framework
- Less flexible for customization outside of Microsoft Office formats
Code Comparison
docx4j (Java):
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.createPackage();
MainDocumentPart mainDocumentPart = wordMLPackage.getMainDocumentPart();
mainDocumentPart.addParagraphOfText("Hello World!");
wordMLPackage.save(new File("HelloWorld.docx"));
Open-XML-SDK (C#):
using (WordprocessingDocument doc = WordprocessingDocument.Create("HelloWorld.docx", WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = doc.AddMainDocumentPart();
mainPart.Document = new Document(new Body(new Paragraph(new Run(new Text("Hello World!")))));
}
Both libraries provide APIs for creating and manipulating Office Open XML documents. docx4j offers broader platform support and flexibility, while Open-XML-SDK provides tighter integration with .NET and potentially better performance on Windows systems. The choice between them often depends on the development environment and specific project requirements.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
README
What is docx4j?
docx4j is an open source (Apache v2) library for creating, editing, and saving OpenXML "packages", including docx, pptx, and xslx.
It uses JAXB to create the Java representation.
- Open existing docx/pptx/xlsx
- Create new docx/pptx/xlsx
- Programmatically manipulate docx/pptx/xlsx (anything the file format allows)
- Document generation via variable, content control data binding, or MERGEFIELD
- CustomXML binding (with support for pictures, rich text, checkboxes, and OpenDoPE extensions for repeats & conditionals, and importing XHTML)
- Export as HTML
- Export as PDF, choice of 3 strategies, see https://www.docx4java.org/blog/2020/09/office-pptxxlsxdocx-to-pdf-to-in-docx4j-8-2-3/
- Produce/consume Word 2007's xmlPackage (pkg) format
- Apply transforms, including common filters
- Font support (font substitution, and use of any fonts embedded in the document)
docx4j for JAXB 3.0 and Java 11+
docx4j v11.4.5 uses Jakarta XML Binding API 3.0, as opposed to JAXB 2.x used in earlier versions (which import javax.xml.bind.*). Since this release uses jakarta.xml.bind, rather than javax.xml.bind, if you have existing code which imports javax.xml.bind, you'll need to search/replace across your code base, replacing javax.xml.bind with jakarta.xml.bind. You'll also need to replace your JAXB jars (which Maven will do for you automatically; otherwise get them from the relevant zip file).
Being a JPMS modularised release, the jars also contain module-info.class entries.
To use it, add the dep corresponding to the JAXB implementation you wish to use
docx4j-8
This is docx4j for Java 8. Although in principle it would compile and run under Java 6, some of its dependencies are Java 8 only. So to run it under Java 6, you'd need to use the same version of the deps which docx4j 6.x uses.
docx4j v8 is a multi-module Maven project.
To use docx4j v8, add the dep corresponding to the JAXB implementation you wish to use
- docx4j-JAXB-Internal (shipped in Oracle and OpenJDK v8)
- docx4j-JAXB-ReferenceImpl (you may need to respect the endorsed dir mechanism for the RI jars)
- docx4j-JAXB-MOXy
You should use one and only one of docx4j-JAXB-*
How do I build docx4j?
Get it from GitHub, at https://github.com/plutext/docx4j
mvn clean
mvn install
Some of the tests might fail on Windows. For now, you could skip them: mvn install -DskipTests
For more details, see http://www.docx4java.org/blog/2015/06/docx4j-from-github-in-eclipse-5-years-on/
If you are working with the source code, please join the developer mailing list:
docx4j-dev-subscribe@docx4java.org
Where do I get a binary?
http://www.docx4java.org/downloads.html
How do I get started?
See the Getting Started guide: https://github.com/plutext/docx4j/tree/master/docs
and the Cheat Sheet: http://www.docx4java.org/blog/2013/05/docx4j-in-a-single-page/
And see the sample code: https://github.com/plutext/docx4j/tree/master/src/samples
You'll probably want the Helper AddIn to generate code: http://www.docx4java.org/blog/2016/05/docx4j-helper-word-addin-new-version-v3-3-0/
Where to get help?
http://www.docx4java.org/forums or StackOverflow (use tag 'docx4j')
Please post to one or the other, not both
Legal Information
docx4j is published under the Apache License version 2.0. For the license text, please see the following files in the legals directory:
- LICENSE
- NOTICE Legal information on libraries used by docx4j can be found in the "legals/NOTICE" file.
Top Related Projects
Mirror of Apache POI
A pure PHP library for reading and writing word processing documents
Open XML SDK by Microsoft
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot