Top Related Projects
LR(1) parser generator for Rust
Rust parser combinator framework
Parsing Expression Grammar (PEG) parser generator for Rust
An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.
Quick Overview
Pest is a fast and easy-to-use parser written in Rust. It uses Parsing Expression Grammars (PEGs) to define syntax, allowing for expressive and flexible grammar definitions. Pest aims to simplify the process of writing parsers while maintaining high performance.
Pros
- Easy to learn and use, with a simple syntax for defining grammars
- Excellent performance due to its Rust implementation
- Generates helpful error messages for debugging
- Supports both runtime and compile-time grammar checking
Cons
- Limited to PEG parsing, which may not be suitable for all parsing needs
- Requires understanding of PEG concepts, which can be challenging for beginners
- Less flexible than hand-written parsers for complex scenarios
- Documentation could be more comprehensive for advanced use cases
Code Examples
- Defining a simple grammar:
use pest_derive::Parser;
#[derive(Parser)]
#[grammar = "grammar.pest"]
pub struct MyParser;
- Parsing input using the defined grammar:
use pest::Parser;
let pairs = MyParser::parse(Rule::main, "Hello, world!")
.expect("unsuccessful parse");
for pair in pairs {
// Process the parsed data
println!("Rule: {:?}, Span: {:?}", pair.as_rule(), pair.as_span());
}
- Extracting data from parsed results:
let inner = pair.into_inner().next().unwrap();
let value = inner.as_str();
println!("Extracted value: {}", value);
Getting Started
-
Add Pest to your
Cargo.toml
:[dependencies] pest = "2.5" pest_derive = "2.5"
-
Create a grammar file (e.g.,
grammar.pest
) and define your rules:main = { SOI ~ greeting ~ "," ~ name ~ "!" ~ EOI } greeting = { "Hello" | "Hi" } name = { ASCII_ALPHA+ }
-
Use the grammar in your Rust code:
use pest::Parser; use pest_derive::Parser; #[derive(Parser)] #[grammar = "grammar.pest"] struct MyParser; fn main() { let pairs = MyParser::parse(Rule::main, "Hello, World!") .expect("unsuccessful parse"); // Process the parsed data }
Competitor Comparisons
LR(1) parser generator for Rust
Pros of LALRPOP
- Generates faster parsers for complex grammars
- Supports left-recursive rules, allowing for more natural grammar definitions
- Provides better error messages and recovery mechanisms
Cons of LALRPOP
- Steeper learning curve, especially for those unfamiliar with LR parsing
- Less flexible for handling ambiguous grammars
- Requires more boilerplate code for parser setup
Code Comparison
LALRPOP grammar example:
Term: i32 = {
<n:Num> => n,
"(" <t:Term> ")" => t,
};
Num: i32 = r"[0-9]+" => i32::from_str(<>).unwrap();
Pest grammar example:
term = { num | "(" ~ term ~ ")" }
num = { ASCII_DIGIT+ }
LALRPOP offers more control over the parsing process and type annotations, while Pest provides a more concise and intuitive grammar definition. LALRPOP is better suited for complex, performance-critical parsers, whereas Pest excels in simplicity and ease of use for simpler grammars.
Rust parser combinator framework
Pros of nom
- More flexible and powerful, allowing for complex parsing scenarios
- Better performance for certain types of parsing tasks
- Extensive ecosystem with many pre-built parsers and combinators
Cons of nom
- Steeper learning curve due to its more complex API
- Can be more verbose for simple parsing tasks
- Requires more manual error handling and reporting
Code Comparison
nom example:
use nom::{
IResult,
bytes::complete::tag,
sequence::tuple
};
fn parser(input: &str) -> IResult<&str, (&str, &str)> {
tuple((tag("Hello"), tag(" world!")))(input)
}
pest example:
use pest::Parser;
#[derive(Parser)]
#[grammar = "hello.pest"]
struct HelloParser;
// In hello.pest:
// greeting = { "Hello" ~ " world!" }
Both nom and pest are popular parsing libraries for Rust, each with its own strengths. nom offers more flexibility and power for complex parsing scenarios, while pest provides a more user-friendly approach for simpler grammars. The choice between them often depends on the specific requirements of the parsing task at hand.
Parsing Expression Grammar (PEG) parser generator for Rust
Pros of rust-peg
- More mature and established project with a longer history
- Supports left-recursive grammars, allowing for more expressive parsing rules
- Generates faster parsers for certain grammar types
Cons of rust-peg
- Less actively maintained, with fewer recent updates
- More complex syntax for defining grammars
- Limited documentation and examples compared to Pest
Code Comparison
Pest grammar example:
number = { ASCII_DIGIT+ }
operation = { "+" | "-" | "*" | "/" }
expression = { number ~ (operation ~ number)* }
rust-peg grammar example:
number -> u32
= n:$(['0'..='9']+) { n.parse().unwrap() }
operation -> char
= ['+' | '-' | '*' | '/']
expression -> Vec<u32>
= n:number op:operation e:expression { vec![n, e[0]] }
/ n:number { vec![n] }
Both Pest and rust-peg are parsing expression grammar (PEG) libraries for Rust, offering different approaches to grammar definition and parsing. Pest focuses on simplicity and ease of use, while rust-peg provides more advanced features at the cost of complexity. The choice between them depends on the specific requirements of your parsing project and personal preferences.
An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.
Pros of regex
- More efficient for simple pattern matching tasks
- Widely recognized syntax, familiar to developers from other languages
- Extensive documentation and community support
Cons of regex
- Limited expressiveness for complex parsing tasks
- Can become difficult to read and maintain for intricate patterns
- Less flexibility in handling nested structures or context-sensitive grammars
Code Comparison
regex:
use regex::Regex;
let re = Regex::new(r"^\d{4}-\d{2}-\d{2}$").unwrap();
let is_date = re.is_match("2023-05-15");
pest:
use pest::Parser;
#[derive(Parser)]
#[grammar = "date.pest"]
struct DateParser;
let pairs = DateParser::parse(Rule::date, "2023-05-15").unwrap();
Summary
regex is better suited for simple pattern matching tasks and benefits from widespread familiarity. pest excels in more complex parsing scenarios, offering greater expressiveness and maintainability for intricate grammars. The choice between the two depends on the specific requirements of your project, with regex being more appropriate for straightforward pattern matching and pest for more sophisticated parsing needs.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
pest. The Elegant Parser
pest is a general purpose parser written in Rust with a focus on accessibility, correctness, and performance. It uses parsing expression grammars (or PEG) as input, which are similar in spirit to regular expressions, but which offer the enhanced expressivity needed to parse complex languages.
Getting started
The recommended way to start parsing with pest is to read the official book.
Other helpful resources:
- API reference on docs.rs
- play with grammars and share them on our fiddle
- find previous common questions answered or ask questions on GitHub Discussions
- leave feedback, ask questions, or greet us on Gitter or Discord
Example
The following is an example of a grammar for a list of alphanumeric identifiers where all identifiers don't start with a digit:
alpha = { 'a'..'z' | 'A'..'Z' }
digit = { '0'..'9' }
ident = { !digit ~ (alpha | digit)+ }
ident_list = _{ ident ~ (" " ~ ident)* }
// ^
// ident_list rule is silent which means it produces no tokens
Grammars are saved in separate .pest files which are never mixed with procedural code. This results in an always up-to-date formalization of a language that is easy to read and maintain.
Meaningful error reporting
Based on the grammar definition, the parser also includes automatic error
reporting. For the example above, the input "123"
will result in:
thread 'main' panicked at ' --> 1:1
|
1 | 123
| ^---
|
= unexpected digit', src/main.rs:12
while "ab *"
will result in:
thread 'main' panicked at ' --> 1:1
|
1 | ab *
| ^---
|
= expected ident', src/main.rs:12
These error messages can be obtained from their default Display
implementation,
e.g. panic!("{}", parser_result.unwrap_err())
or println!("{}", e)
.
Pairs API
The grammar can be used to derive a Parser
implementation automatically.
Parsing returns an iterator of nested token pairs:
use pest_derive::Parser;
use pest::Parser;
#[derive(Parser)]
#[grammar = "ident.pest"]
struct IdentParser;
fn main() {
let pairs = IdentParser::parse(Rule::ident_list, "a1 b2").unwrap_or_else(|e| panic!("{}", e));
// Because ident_list is silent, the iterator will contain idents
for pair in pairs {
// A pair is a combination of the rule which matched and a span of input
println!("Rule: {:?}", pair.as_rule());
println!("Span: {:?}", pair.as_span());
println!("Text: {}", pair.as_str());
// A pair can be converted to an iterator of the tokens which make it up:
for inner_pair in pair.into_inner() {
match inner_pair.as_rule() {
Rule::alpha => println!("Letter: {}", inner_pair.as_str()),
Rule::digit => println!("Digit: {}", inner_pair.as_str()),
_ => unreachable!()
};
}
}
}
This produces the following output:
Rule: ident
Span: Span { start: 0, end: 2 }
Text: a1
Letter: a
Digit: 1
Rule: ident
Span: Span { start: 3, end: 5 }
Text: b2
Letter: b
Digit: 2
Defining multiple parsers in a single file
The current automatic Parser
derivation will produce the Rule
enum
which would have name conflicts if one tried to define multiple such structs
that automatically derive Parser
. One possible way around it is to put each
parser struct in a separate namespace:
mod a {
#[derive(Parser)]
#[grammar = "a.pest"]
pub struct ParserA;
}
mod b {
#[derive(Parser)]
#[grammar = "b.pest"]
pub struct ParserB;
}
Other features
- Precedence climbing
- Input handling
- Custom errors
- Runs on stable Rust
Projects using pest
You can find more projects and ecosystem tools in the awesome-pest repo.
- pest_meta (bootstrapped)
- AshPaper
- brain
- cicada
- comrak
- elastic-rs
- graphql-parser
- handlebars-rust
- hexdino
- Huia
- insta
- jql
- json5-rs
- mt940
- Myoxine
- py_literal
- rouler
- RuSh
- rs_pbrt
- stache
- tera
- ui_gen
- ukhasnet-parser
- ZoKrates
- Vector
- AutoCorrect
- yaml-peg
- qubit
- caith (a dice roller crate)
- Melody
- json5-nodes
- prisma
Minimum Supported Rust Version (MSRV)
This library should always compile with default features on Rust 1.61.0.
no_std support
The pest
and pest_derive
crates can be built without the Rust standard
library and target embedded environments. To do so, you need to disable
their default features. In your Cargo.toml
, you can specify it as follows:
[dependencies]
# ...
pest = { version = "2", default-features = false }
pest_derive = { version = "2", default-features = false }
If you want to build these crates in the pest repository's workspace, you can
pass the --no-default-features
flag to cargo
and specify these crates using
the --package
(-p
) flag. For example:
$ cargo build --target thumbv7em-none-eabihf --no-default-features -p pest
$ cargo bootstrap
$ cargo build --target thumbv7em-none-eabihf --no-default-features -p pest_derive
Special thanks
A special round of applause goes to prof. Marius Minea for his guidance and all pest contributors, some of which being none other than my friends.
Top Related Projects
LR(1) parser generator for Rust
Rust parser combinator framework
Parsing Expression Grammar (PEG) parser generator for Rust
An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot