antlr4
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files.
Top Related Projects
JavaCC - a parser generator for building parsers from grammars. It can generate code in Java, C++ and C#.
Quick Overview
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. ANTLR takes as input a grammar that specifies a language and generates as output source code for a recognizer for that language.
Pros
- Supports a wide range of target languages (Java, C#, Python, JavaScript, Go, C++, Swift)
- Provides an easy-to-use, human-readable grammar syntax
- Includes built-in support for parse tree generation and traversal
- Offers excellent documentation and a large, active community
Cons
- Learning curve can be steep for beginners
- Generated parsers may be slower than hand-written ones for some use cases
- Large runtime library size for some target languages
- Can be overkill for simple parsing tasks
Code Examples
- Simple arithmetic expression grammar:
grammar Arithmetic;
expr : expr ('*'|'/') expr
| expr ('+'|'-') expr
| INT
| '(' expr ')'
;
INT : [0-9]+ ;
WS : [ \t\r\n]+ -> skip ;
- Using the generated parser in Java:
String input = "2 * (3 + 4)";
ArithmeticLexer lexer = new ArithmeticLexer(CharStreams.fromString(input));
CommonTokenStream tokens = new CommonTokenStream(lexer);
ArithmeticParser parser = new ArithmeticParser(tokens);
ParseTree tree = parser.expr();
- Traversing the parse tree with a listener:
ParseTreeWalker walker = new ParseTreeWalker();
MyArithmeticListener listener = new MyArithmeticListener();
walker.walk(listener, tree);
Getting Started
-
Install ANTLR:
pip install antlr4-tools
-
Write your grammar (e.g.,
Arithmetic.g4
) -
Generate parser:
antlr4 Arithmetic.g4
-
Compile and run (Java example):
javac *.java java YourMainClass
For other languages, refer to the ANTLR documentation for specific setup instructions.
Competitor Comparisons
JavaCC - a parser generator for building parsers from grammars. It can generate code in Java, C++ and C#.
Pros of JavaCC
- Simpler syntax and easier to learn for beginners
- Better integration with Java, allowing direct embedding of Java code
- More flexible in terms of grammar specification
Cons of JavaCC
- Limited support for target languages (primarily Java)
- Less powerful in handling complex grammars
- Smaller community and fewer resources compared to ANTLR
Code Comparison
ANTLR4 grammar example:
grammar Example;
expr: term (('+' | '-') term)*;
term: factor (('*' | '/') factor)*;
factor: NUMBER | '(' expr ')';
NUMBER: [0-9]+;
WS: [ \t\r\n]+ -> skip;
JavaCC grammar example:
PARSER_BEGIN(Example)
public class Example {}
PARSER_END(Example)
TOKEN:
{
< NUMBER: (["0"-"9"])+ >
| < WHITESPACE: [" ","\t","\n","\r"] > {SPC()}
}
void expr():
{}
{
term() (("+" | "-") term())*
}
void term():
{}
{
factor() (("*" | "/") factor())*
}
void factor():
{}
{
<NUMBER> | "(" expr() ")"
}
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
ANTLR v4
ANTLR (ANother Tool for Language Recognition) is a powerful parser generator for reading, processing, executing, or translating structured text or binary files. It's widely used to build languages, tools, and frameworks. From a grammar, ANTLR generates a parser that can build parse trees and also generates a listener interface (or visitor) that makes it easy to respond to the recognition of phrases of interest.
Dev branch build status
Versioning
ANTLR 4 supports 10 target languages (Cpp, CSharp, Dart, Java, JavaScript, PHP, Python3, Swift, TypeScript, Go), and ensuring consistency across these targets is a unique and highly valuable feature. To ensure proper support of this feature, each release of ANTLR is a complete release of the tool and the 10 runtimes, all with the same version. As such, ANTLR versioning does not strictly follow semver semantics:
- a component may be released with the latest version number even though nothing has changed within that component since the previous release
- major version is bumped only when ANTLR is rewritten for a totally new "generation", such as ANTLR3 -> ANTLR4 (LL(*) -> ALL(*) parsing)
- minor version updates may include minor breaking changes, the policy is to regenerate parsers with every release (4.11 -> 4.12)
- backwards compatibility is only guaranteed for patch version bumps (4.11.1 -> 4.11.2)
If you use a semver verifier in your CI, you probably want to apply special rules for ANTLR, such as treating minor change as a major change.
Repo branch structure
The default branch for this repo is master
, which is the latest stable release and has tags for the various releases; e.g., see release tag 4.9.3. Branch dev
is where development occurs between releases and all pull requests should be derived from that branch. The dev
branch is merged back into master
to cut a release and the release state is tagged (e.g., with 4.10-rc1
or 4.10
.) Visually our process looks roughly like this:
The Go target now has its own dedicated repo:
$ go get github.com/antlr4-go/antlr
Note
The dedicated Go repo is for go get
and import
only. Go runtime development is still performed in the main antlr/antlr4
repo.
Authors and major contributors
- Terence Parr, parrt@cs.usfca.edu ANTLR project lead and supreme dictator for life University of San Francisco
- Sam Harwell (Tool co-author, Java and original C# target)
- Eric Vergnaud (Javascript, TypeScript, Python2, Python3 targets and maintenance of C# target)
- Peter Boyer (Go target)
- Mike Lischke (C++ completed target)
- Dan McLaughlin (C++ initial target)
- David Sisson (C++ initial target and test)
- Janyou (Swift target)
- Ewan Mellor, Hanzhou Shi (Swift target merging)
- Ben Hamilton (Full Unicode support in serialized ATN and all languages' runtimes for code points > U+FFFF)
- Marcos Passos (PHP target)
- Lingyu Li (Dart target)
- Ivan Kochurkin has made major contributions to overall quality, error handling, and Target performance.
- Justin King has done a huge amount of work across multiple targets, but especially for C++.
- Ken Domino has a knack for finding bugs/issues and analysis; also a major contributor on the grammars-v4 repo.
- Jim Idle has contributed to previous versions of ANTLR and recently jumped back in to solve a major problem with the Go target.
Useful information
- Release notes
- Getting started with v4
- Official site
- Documentation
- FAQ
- ANTLR code generation targets
(Currently: Java, C#, Python3, JavaScript, TypeScript, Go, C++, Swift, Dart, PHP) - Note: As of version 4.14, we are dropping support for Python 2. We love the Python community, but Python 2 support was officially halted in Jan 2020. More recently, GiHub also dropped support for Python 2, which has made it impossible for us to maintain a consistent level of quality across targets (we use GitHub for our CI). Long live Python 3!
- Java API
- ANTLR v3
- v3 to v4 Migration, differences
You might also find the following pages useful, particularly if you want to mess around with the various target languages.
The Definitive ANTLR 4 Reference
Programmers run into parsing problems all the time. Whether itâs a data format like JSON, a network protocol like SMTP, a server configuration file for Apache, a PostScript/PDF file, or a simple spreadsheet macro languageâANTLR v4 and this book will demystify the process. ANTLR v4 has been rewritten from scratch to make it easier than ever to build parsers and the language applications built on top. This completely rewritten new edition of the bestselling Definitive ANTLR Reference shows you how to take advantage of these new features.
You can buy the book The Definitive ANTLR 4 Reference at amazon or an electronic version at the publisher's site.
You will find the Book source code useful.
Additional grammars
This repository is a collection of grammars without actions where the root directory name is the all-lowercase name of the language parsed by the grammar. For example, java, cpp, csharp, c, etc...
Top Related Projects
JavaCC - a parser generator for building parsers from grammars. It can generate code in Java, C++ and C#.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot