pegdown
A pure-Java Markdown processor based on a parboiled PEG parser supporting a number of extensions
Top Related Projects
Determines which markup library to use to render a content file (e.g. README) on GitHub
Java library for parsing and rendering CommonMark (Markdown)
CommonMark/Markdown Java parser with source level AST. CommonMark 0.28, emulation of: pegdown, kramdown, markdown.pl, MultiMarkdown. With HTML to MD, MD to PDF, MD to DOCX conversion modules.
Quick Overview
Pegdown is a pure Java library for parsing and rendering Markdown documents. It is based on the parboiled PEG parser and aims to provide a fast, flexible, and extensible Markdown processor for Java applications.
Pros
- Fast parsing and rendering performance
- Supports various Markdown extensions and flavors
- Highly configurable and extensible
- Well-documented and actively maintained
Cons
- Limited to Java environments
- May have a steeper learning curve compared to simpler Markdown processors
- Some users report occasional parsing inconsistencies with certain edge cases
- Larger dependency size compared to lightweight alternatives
Code Examples
- Basic Markdown parsing and rendering:
PegDownProcessor processor = new PegDownProcessor();
String markdown = "# Hello, Markdown!";
String html = processor.markdownToHtml(markdown);
System.out.println(html);
- Using custom parsing options:
int options = Extensions.FENCED_CODE_BLOCKS | Extensions.TABLES;
PegDownProcessor processor = new PegDownProcessor(options);
String markdown = "```java\nSystem.out.println(\"Hello, World!\");\n```";
String html = processor.markdownToHtml(markdown);
System.out.println(html);
- Implementing a custom visitor:
class CustomVisitor extends ToHtmlSerializerVisitor {
@Override
public void visit(HeaderNode node) {
printer.print("<h" + node.getLevel() + " class=\"custom-header\">");
visitChildren(node);
printer.print("</h" + node.getLevel() + ">");
}
}
PegDownProcessor processor = new PegDownProcessor();
String markdown = "# Custom Header";
RootNode rootNode = processor.parseMarkdown(markdown.toCharArray());
ToHtmlSerializer serializer = new ToHtmlSerializer(new CustomVisitor());
String html = serializer.toHtml(rootNode);
System.out.println(html);
Getting Started
To use Pegdown in your Java project, follow these steps:
- Add the Pegdown dependency to your project's build file (e.g., Maven):
<dependency>
<groupId>org.pegdown</groupId>
<artifactId>pegdown</artifactId>
<version>1.6.0</version>
</dependency>
- Create a
PegDownProcessor
instance and use it to parse and render Markdown:
import org.pegdown.PegDownProcessor;
public class MarkdownExample {
public static void main(String[] args) {
PegDownProcessor processor = new PegDownProcessor();
String markdown = "# Hello, Markdown!\n\nThis is a **bold** statement.";
String html = processor.markdownToHtml(markdown);
System.out.println(html);
}
}
This will output the parsed HTML representation of the Markdown input.
Competitor Comparisons
Determines which markup library to use to render a content file (e.g. README) on GitHub
Pros of markup
- Supports multiple markup languages (Markdown, reStructuredText, textile, etc.)
- Actively maintained with regular updates and contributions
- Integrated with GitHub's rendering system for README files and other documentation
Cons of markup
- More complex setup and usage compared to pegdown
- Potentially slower processing due to supporting multiple markup languages
- May have inconsistencies in rendering across different markup formats
Code comparison
markup:
GitHub::Markup.render('README.md', File.read('README.md'))
pegdown:
PegDownProcessor processor = new PegDownProcessor();
String html = processor.markdownToHtml(markdown);
Additional notes
pegdown is a pure Java library focused solely on Markdown processing, making it simpler to use and potentially faster for Markdown-specific tasks. However, it has not been updated since 2016, which may be a concern for long-term maintenance and compatibility.
markup offers broader language support and is actively maintained, but may be overkill for projects that only require Markdown processing. Its integration with GitHub's ecosystem makes it particularly useful for projects hosted on the platform.
Java library for parsing and rendering CommonMark (Markdown)
Pros of commonmark-java
- Implements the CommonMark specification, ensuring better standardization and compatibility
- Actively maintained with regular updates and improvements
- Offers a more flexible and extensible API for customization
Cons of commonmark-java
- May have a steeper learning curve for developers familiar with pegdown
- Potentially slower parsing speed compared to pegdown in some scenarios
Code Comparison
pegdown:
PegDownProcessor processor = new PegDownProcessor();
String html = processor.markdownToHtml("# Hello, world!");
commonmark-java:
Parser parser = Parser.builder().build();
Node document = parser.parse("# Hello, world!");
HtmlRenderer renderer = HtmlRenderer.builder().build();
String html = renderer.render(document);
The code comparison shows that commonmark-java requires a two-step process (parsing and rendering) compared to pegdown's single method call. However, this separation allows for more flexibility in processing the parsed document before rendering.
Both libraries have their strengths, but commonmark-java's adherence to the CommonMark specification and active development make it a strong choice for new projects or those requiring strict Markdown compliance.
CommonMark/Markdown Java parser with source level AST. CommonMark 0.28, emulation of: pegdown, kramdown, markdown.pl, MultiMarkdown. With HTML to MD, MD to PDF, MD to DOCX conversion modules.
Pros of flexmark-java
- More actively maintained with frequent updates
- Better performance and lower memory usage
- Supports a wider range of Markdown extensions and customizations
Cons of flexmark-java
- Steeper learning curve due to more complex API
- Larger codebase and dependency footprint
Code Comparison
pegdown
PegDownProcessor processor = new PegDownProcessor();
String html = processor.markdownToHtml("# Hello, Markdown!");
flexmark-java
Parser parser = Parser.builder().build();
HtmlRenderer renderer = HtmlRenderer.builder().build();
Node document = parser.parse("# Hello, Markdown!");
String html = renderer.render(document);
Key Differences
- flexmark-java offers more granular control over parsing and rendering
- pegdown uses a single method call for conversion, while flexmark-java separates parsing and rendering steps
- flexmark-java's approach allows for easier intermediate document manipulation
Conclusion
flexmark-java is generally considered the more powerful and flexible option, but pegdown may be simpler for basic use cases. The choice between them depends on specific project requirements and the level of customization needed.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
:>>> DEPRECATION NOTE <<<:
Although still one of the most popular Markdown parsing libraries for the JVM, pegdown has reached its end of life.
The project is essentially unmaintained with tickets piling up and crucial bugs not being fixed.
pegdown's parsing performance isn't great. In some cases of pathological input runtime can even become exponential,
which means that the parser either appears to "hang" completely or abort processing after a time-out.
Therefore pegdown is not recommended anymore for use in new projects requiring a markdown parser.
Instead I suggest you turn to @vsch's flexmark-java, which appears to be an excellent replacement for these reasons:
- Modern parser architecture (based on commonmark-java), designed from the ground up as a pegdown replacement and supporting all its features and extensions
- 30x better average parsing performance without pathological input cases
- Configuration options for a multitude of markdown dialects (CommonMark, pegdown, MultiMarkdown, kramdown and Markdown.pl)
- Actively maintained and used as the basis of an IntelliJ plugin with almost 2M downloads per year
- The author (@vsch) has actively contributed to pegdown maintenance in the last two years and is intimately familiar with pegdown's internals and quirks.
In case you need support with migrating from pegdown to flexmark-java, @vsch welcomes inquiries in here or here.
Introduction
pegdown is a pure Java library for clean and lightweight Markdown processing based on a parboiled PEG parser.
pegdown is nearly 100% compatible with the original Markdown specification and fully passes the original Markdown test suite. On top of the standard Markdown feature set pegdown implements a number of extensions similar to what other popular Markdown processors offer. You can also extend pegdown by your own plugins! Currently pegdown supports the following extensions over standard Markdown:
- SMARTS: Beautifies apostrophes, ellipses ("..." and ". . .") and dashes ("--" and "---")
- QUOTES: Beautifies single quotes, double quotes and double angle quotes (« and »)
- SMARTYPANTS: Convenience extension enabling both, SMARTS and QUOTES, at once.
- ABBREVIATIONS: Abbreviations in the way of PHP Markdown Extra.
- ANCHORLINKS: Generate anchor links for headers by taking the first range of alphanumerics and spaces.
- HARDWRAPS: Alternative handling of newlines, see Github-flavoured-Markdown
- AUTOLINKS: Plain (undelimited) autolinks the way Github-flavoured-Markdown implements them.
- TABLES: Tables similar to MultiMarkdown (which is in turn like the PHP Markdown Extra tables, but with colspan support).
- DEFINITION LISTS: Definition lists in the way of PHP Markdown Extra.
- FENCED CODE BLOCKS: Fenced Code Blocks in the way of PHP Markdown Extra or Github-flavoured-Markdown.
- HTML BLOCK SUPPRESSION: Suppresses the output of HTML blocks.
- INLINE HTML SUPPRESSION: Suppresses the output of inline HTML elements.
- WIKILINKS: Support
[[Wiki-style links]]
with a customizable URL rendering logic. - STRIKETHROUGH: Support
strikethroughsas supported in Pandoc and Github. - ATXHEADERSPACE: Require a space between the
#
and the header title text, as per Github-flavoured-Markdown. Frees up#
without a space to be just plain text. - FORCELISTITEMPARA: Wrap a list item or definition term in
<p>
tags if it contains more than a simple paragraph. - RELAXEDHRULES: allow horizontal rules without a blank line following them.
- TASKLISTITEMS: parses bullet lists of the form
* [ ]
and* [x]
to create GitHub like task list items:- open task item
- closed or completed task item.
- also closed or completed task item.
- EXTANCHORLINKS: Generate anchor links for headers using complete contents of the header.
- Spaces and non-alphanumerics replaced by
-
, multiple dashes trimmed to one. - Anchor link is added as first element inside the header with empty content:
<h1><a name="header"></a>header</h1>
- Spaces and non-alphanumerics replaced by
Note: pegdown differs from the original Markdown in that it ignores in-word emphasis as in
> my_cool_file.txt
> 2*3*4=5
Currently this "extension" cannot be switched off.
Installation
You have two options:
-
Download the JAR for the latest version from here. pegdown 1.6.0 has only one dependency: parboiled for Java, version 1.1.7.
-
The pegdown artifact is also available from maven central with group id org.pegdown and artifact-id pegdown.
Usage
Using pegdown is very simple: Just create a new instance of a PegDownProcessor and call one of its
markdownToHtml
methods to convert the given Markdown source to an HTML string. If you'd like to customize the
rendering of HTML links (Auto-Links, Explicit-Links, Mail-Links, Reference-Links and/or Wiki-Links), e.g. for adding
rel="nofollow"
attributes based on some logic you can supply your own instance of a LinkRenderer with the call
to markdownToHtml
.
You can also use pegdown only for the actual parsing of the Markdown source and do the serialization to the
target format (e.g. XML) yourself. To do this just call the parseMarkdown
method of the PegDownProcessor to obtain
the root node of the Abstract Syntax Tree for the document.
With a custom Visitor implementation you can do whatever serialization you want. As an example you might want to
take a look at the sources of the ToHtmlSerializer.
Note that the first time you create a PegDownProcessor it can take up to a few hundred milliseconds to prepare the underlying parboiled parser instance. However, once the first processor has been built all further instantiations will be fast. Also, you can reuse an existing PegDownProcessor instance as often as you want, as long as you prevent concurrent accesses, since neither the PegDownProcessor nor the underlying parser is thread-safe.
See http://sirthias.github.com/pegdown/api for the pegdown API documentation.
Plugins
Since parsing and serialisation are two different things there are two different plugin mechanisms, one for the parser, and one for the ToHtmlSerializer. Most plugins would probably implement both, but it is possible that a plugin might just implement the parser plugin interface.
For the parser there are two plugin points, one for inline plugins (inside a paragraph) and one for block plugins. These are provided to the parser using the PegDownPlugins class. For convenience of use this comes with its own builder. You can either pass individual rules to this builder (which is what you probably would do if you were using Scala rules), but you can also pass it a parboiled Java parser class which implements either InlinePluginParser or BlockPluginParser or both. PegDownPlugins will enhance this parser for you, so as a user of a plugin you just need to pass the class to it (and the arguments for that classes constructor, if any). To implement the plugin, you would write a normal parboiled parser, and implement the appropriate parser plugin interface. You can extend the pegdown parser, this is useful if you want to reuse any of its rules.
For the serializer there is ToHtmlSerializerPlugin interface. It is called when a node that the ToHtmlSerializer
doesn't know how to process is encountered (i.e. one produced by a parser plugin). Its accept
method is passed the
node, the visitor (so if the node contains child nodes they can be rendered using the parent) and the printer for the
plugin to print to. The accept
method returns true if it knew how to handle the node or false if otherwise and
the ToHtmlSerializer loops through each plugin breaking when it reaches one that returns true and if it finds none
throws an exception like it used to.
As an very simple example you might want to take a look at the sources of the PluginParser test class.
Parsing Timeouts
Since Markdown has no official grammar and contains a number of ambiguities the parsing of Markdown source, especially with enabled language extensions, can be "hard" and result, in certain corner cases, in exponential parsing time. In order to provide a somewhat predictable behavior pegdown therefore supports the specification of a parsing timeout, which you can supply to the PegDownProcessor constructor.
If the parser happens to run longer than the specified timeout period it terminates itself with an exception, which
causes the markdownToHtml
method to return null
. Your application should then deal with this case accordingly and,
for example, inform the user.
The default timeout, if not explicitly specified, is 2 seconds.
IDE Support
The excellent idea-markdown plugin for IntelliJ IDEA, RubyMine, PhpStorm, WebStorm, PyCharm and appCode uses pegdown as its underlying parsing engine. The plugin gives you proper syntax-highlighting for markdown source and shows you exactly, how pegdown will parse your texts.
Credits
A large part of the underlying PEG grammar was developed by John MacFarlane and made available with his tool peg-markdown.
License
pegdown is licensed under Apache License 2.0.
Patch Policy
Feedback and contributions to the project, no matter what kind, are always very welcome. However, patches can only be accepted from their original author. Along with any patches, please state that the patch is your original work and that you license the work to the pegdown project under the projectâs open source license.
Top Related Projects
Determines which markup library to use to render a content file (e.g. README) on GitHub
Java library for parsing and rendering CommonMark (Markdown)
CommonMark/Markdown Java parser with source level AST. CommonMark 0.28, emulation of: pegdown, kramdown, markdown.pl, MultiMarkdown. With HTML to MD, MD to PDF, MD to DOCX conversion modules.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot