parser

A Ruby parser.

1,608

203

1,608

Top Related Projects

Quick Overview

The whitequark/parser repository is a Ruby parser and AST (Abstract Syntax Tree) generator. It provides a robust toolset for parsing Ruby code, generating and manipulating ASTs, and performing source rewriting tasks. This library is particularly useful for developers working on static analysis tools, code transformation utilities, or Ruby-based development environments.

Pros

High-quality, well-maintained parser with support for multiple Ruby versions
Comprehensive AST representation with detailed node information
Includes source rewriting capabilities for code transformation tasks
Extensive documentation and examples for easy integration

Cons

Learning curve for understanding AST structure and manipulation
Performance may be slower compared to native C implementations
Limited support for very recent Ruby syntax features (may lag behind official releases)
Dependency on other gems may complicate integration in some projects

Code Examples

Parsing Ruby code and inspecting the AST:

require 'parser/current'

ast = Parser::CurrentRuby.parse("puts 'Hello, world!'")
puts ast.inspect
# (send nil :puts (str "Hello, world!"))

Traversing the AST using a custom processor:

require 'parser/current'

class MyProcessor < Parser::AST::Processor
  def on_send(node)
    puts "Found method call: #{node.children[1]}"
    super
  end
end

ast = Parser::CurrentRuby.parse("puts 'Hello'; print 'World'")
processor = MyProcessor.new
processor.process(ast)
# Output:
# Found method call: puts
# Found method call: print

Source rewriting example:

require 'parser/current'
require 'unparser'

code = "def greet(name)\n  puts \"Hello, #{name}!\"\nend"
ast = Parser::CurrentRuby.parse(code)

new_ast = ast.updated(nil, [
  ast.children[0],
  ast.children[1],
  Parser::AST::Node.new(:send, [nil, :puts, Parser::AST::Node.new(:str, ["Goodbye!"])])
])

puts Unparser.unparse(new_ast)
# Output:
# def greet(name)
#   puts "Hello, #{name}!"
#   puts "Goodbye!"
# end

Getting Started

To use the whitequark/parser in your Ruby project:

Add the gem to your Gemfile:
```
gem 'parser'
```
Install the gem:
```
bundle install
```
Require the parser in your Ruby file:
```
require 'parser/current'
```

Parse Ruby code:

ast = Parser::CurrentRuby.parse("your_ruby_code_here")

Now you can work with the generated AST for various parsing and analysis tasks.

Competitor Comparisons

ruby

22,562

The Ruby Programming Language

Pros of ruby

Official Ruby implementation, providing the most authentic and complete Ruby experience
Extensive documentation and community support
Includes the full Ruby standard library and core functionality

Cons of ruby

Larger codebase, potentially more complex for contributors
Slower development cycle due to rigorous testing and compatibility requirements
May be overkill for projects only needing parsing functionality

Code Comparison

ruby:

def parse_program
  result = parse_compstmt
  if !@lexer.eof?
    raise CompileError, "Parse error: unexpected #{@lexer.token}"
  end
  result
end

parser:

def parse_program(source_buffer)
  builder = Builder.new
  parser = Parser.new(builder)
  parser.parse(source_buffer)
end

Summary

While ruby provides the complete Ruby implementation, parser focuses specifically on parsing Ruby code. ruby offers a more comprehensive solution but may be excessive for projects only requiring parsing capabilities. parser, on the other hand, provides a lightweight alternative for parsing tasks but lacks the full Ruby runtime environment. The choice between the two depends on the specific needs of the project and the desired level of integration with the Ruby ecosystem.

rbs

2,035

Type Signature for Ruby

Pros of RBS

Official Ruby project, ensuring long-term support and alignment with Ruby's development
Focused on type definitions and signatures, providing a standardized way to describe Ruby types
Designed for static type checking and analysis tools, enhancing code quality and tooling support

Cons of RBS

Limited to type definitions, not a full-featured parser like Parser
Newer project with potentially less community adoption and tooling integration
May require additional setup and learning curve for teams already using Parser

Code Comparison

RBS example:

class User
  attr_reader name: String
  attr_reader age: Integer

  def initialize: (name: String, age: Integer) -> void
end

Parser example:

require 'parser/current'

ast = Parser::CurrentRuby.parse("class User; end")
puts ast.type  # :class
puts ast.children.first  # s(:const, nil, :User)

Summary

RBS focuses on type definitions and signatures for Ruby, while Parser provides a full Ruby parser with AST generation. RBS is part of the official Ruby project, offering standardized type descriptions for static analysis. Parser, being more established, offers broader parsing capabilities and is widely used in various Ruby tools and libraries. The choice between them depends on specific project needs, with RBS being more suitable for type-related tasks and Parser for general Ruby code analysis and manipulation.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Parser

Parser is a production-ready Ruby parser written in pure Ruby. It recognizes as much or more code than Ripper, Melbourne, JRubyParser or ruby_parser, and is vastly more convenient to use.

You can also use unparser to produce equivalent source code from Parser's ASTs.

Sponsored by Evil Martians. MacRuby and RubyMotion support sponsored by CodeClimate.

[!WARNING] The parser gem is only compatible with the syntax of Ruby 3.3 and lower. For Ruby 3.4 and later, please use the Prism::Translation::Parser instead. Starting in Ruby 3.4, Prism is the parser used in Ruby itself and can produce AST that is identical to the output of the parser gem. If you only need to parse Ruby 3.3 (or greater) and don't require compatibility with the parser gem AST, also consider using the native Prism AST. See this GitHub issue for more details. For a guide on how to use parser for older versions and prism for newer ones, please see this guide.

Installation

$ gem install parser

Usage

Load Parser (see the backwards compatibility section below for explanation of emit_* calls):

require 'parser/current'
# opt-in to most recent AST format:
Parser::Builders::Default.emit_lambda              = true
Parser::Builders::Default.emit_procarg0            = true
Parser::Builders::Default.emit_encoding            = true
Parser::Builders::Default.emit_index               = true
Parser::Builders::Default.emit_arg_inside_procarg0 = true
Parser::Builders::Default.emit_forward_arg         = true
Parser::Builders::Default.emit_kwargs              = true
Parser::Builders::Default.emit_match_pattern       = true

Parse a chunk of code:

p Parser::CurrentRuby.parse("2 + 2")
# (send
#   (int 2) :+
#   (int 2))

Access the AST's source map:

p Parser::CurrentRuby.parse("2 + 2").loc
# #<Parser::Source::Map::Send:0x007fe5a1ac2388
#   @dot=nil,
#   @begin=nil,
#   @end=nil,
#   @selector=#<Source::Range (string) 2...3>,
#   @expression=#<Source::Range (string) 0...5>>

p Parser::CurrentRuby.parse("2 + 2").loc.selector.source
# "+"

Traverse the AST: see the documentation for gem ast.

Parse a chunk of code and display all diagnostics:

parser = Parser::CurrentRuby.new
parser.diagnostics.consumer = lambda do |diag|
  puts diag.render
end

buffer = Parser::Source::Buffer.new('(string)', source: "foo *bar")

p parser.parse(buffer)
# (string):1:5: warning: `*' interpreted as argument prefix
# foo *bar
#     ^
# (send nil :foo
#   (splat
#     (send nil :bar)))

If you reuse the same parser object for multiple #parse runs, you need to #reset it.

You can also use the ruby-parse utility (it's bundled with the gem) to play with Parser:

$ ruby-parse -L -e "2+2"
(send
  (int 2) :+
  (int 2))
2+2
 ~ selector
~~~ expression
(int 2)
2+2
~ expression
(int 2)
2+2

$ ruby-parse -E -e "2+2"
2+2
^ tINTEGER 2                                    expr_end     [0 <= cond] [0 <= cmdarg]
2+2
 ^ tPLUS "+"                                    expr_beg     [0 <= cond] [0 <= cmdarg]
2+2
  ^ tINTEGER 2                                  expr_end     [0 <= cond] [0 <= cmdarg]
2+2
  ^ false "$eof"                                expr_end     [0 <= cond] [0 <= cmdarg]
(send
  (int 2) :+
  (int 2))

Features

Precise source location reporting.
Documented AST format which is convenient to work with.
A simple interface and a powerful, tweakable one.
Parses 1.8, 1.9, 2.0, 2.1, 2.2, 2.3, 2.4, 2.5, 2.6, 2.7, 3.0, 3.1, 3.2, and 3.3 syntax with backwards-compatible AST formats.
Parses MacRuby and RubyMotion syntax extensions.
Rewriting support.
Parsing error recovery.
Improved clang-like diagnostic messages with location information.
Written in pure Ruby, runs on MRI >=2.0.0, JRuby and Rubinius (and historically, all versions of Ruby since 1.8)
Only one runtime dependency: the ast gem.
Insane Ruby lexer rewritten from scratch in Ragel.
100% test coverage for Bison grammars (except error recovery).
Readable, commented source code.

Documentation

Documentation for Parser is available online.

Node names

Several Parser nodes seem to be confusing enough to warrant a dedicated README section.

(block)

The (block) node passes a Ruby block, that is, a closure, to a method call represented by its first child, a (send), (super) or (zsuper) node. To demonstrate:

$ ruby-parse -e 'foo { |x| x + 2 }'
(block
  (send nil :foo)
  (args
    (arg :x))
  (send
    (lvar :x) :+
    (int 2)))

(begin) and (kwbegin)

TL;DR: Unless you perform rewriting, treat (begin) and (kwbegin) as the same node type.

Both (begin) and (kwbegin) nodes represent compound statements, that is, several expressions which are executed sequentally and the value of the last one is the value of entire compound statement. They may take several forms in the source code:

foo; bar: without delimiters
(foo; bar): parenthesized
begin foo; bar; end: grouped with begin keyword
def x; foo; bar; end: grouped inside a method definition

and so on.

$ ruby-parse -e '(foo; bar)'
(begin
  (send nil :foo)
  (send nil :bar))
$ ruby-parse -e 'def x; foo; bar end'
(def :x
  (args)
  (begin
    (send nil :foo)
    (send nil :bar)))

Note that, despite its name, kwbegin node only has tangential relation to the begin keyword. Normally, Parser AST is semantic, that is, if two constructs look differently but behave identically, they get parsed to the same node. However, there exists a peculiar construct called post-loop in Ruby:

begin
  body
end while condition

This specific syntactic construct, that is, keyword begin..end block followed by a postfix while, behaves very unlike other similar constructs, e.g. (body) while condition. While the body itself is wrapped into a while-post node, Parser also supports rewriting, and in that context it is important to not accidentally convert one kind of loop into another.

$ ruby-parse -e 'begin foo end while cond'
(while-post
  (send nil :cond)
  (kwbegin
    (send nil :foo)))
$ ruby-parse -e 'foo while cond'
(while
  (send nil :cond)
  (send nil :foo))
$ ruby-parse -e '(foo) while cond'
(while
  (send nil :cond)
  (begin
    (send nil :foo)))

(Parser also needs the (kwbegin) node type internally, and it is highly problematic to map it back to (begin).)

Backwards compatibility

Parser does not use semantic versioning. Parser versions are structured as x.y.z.t, where x.y.z indicates the most recent supported Ruby release (support for every Ruby release that is chronologically earlier is implied), and t is a monotonically increasing number.

The public API of Parser as well as the AST format (as listed in the documentation) are considered stable forever, although support for old Ruby versions may be removed at some point.

Sometimes it is necessary to modify the format of AST nodes that are already being emitted in a way that would break existing applications. To avoid such breakage, applications must opt-in to these modifications; without explicit opt-in, Parser will continue to emit the old AST node format. The most recent set of opt-ins is specified in the usage section of this README.

Compatibility with Ruby MRI

Unfortunately, Ruby MRI often changes syntax in patchlevel versions. This has happened, at least, for every release since 1.9; for example, commits c5013452 and 04bb9d6b were backported all the way from HEAD to 1.9. Moreover, there is no simple way to track these changes.

This policy makes it all but impossible to make Parser precisely compatible with the Ruby MRI parser. Indeed, at September 2014, it would be necessary to maintain and update ten different parsers together with their lexer quirks in order to be able to emulate any given released Ruby MRI version.

As a result, Parser chooses a different path: the parser/rubyXY parsers recognize the syntax of the latest minor version of Ruby MRI X.Y at the time of the gem release.

Compatibility with MacRuby and RubyMotion

Parser implements the MacRuby 0.12 and RubyMotion mid-2015 parsers precisely. However, the lexers of these have been forked off Ruby MRI and independently maintained for some time, and because of that, Parser may accept some code that these upstream implementations are unable to parse.

Known issues

Adding support for the following Ruby MRI features in Parser would needlessly complicate it, and as they all are very specific and rarely occurring corner cases, this is not done.

Parser has been extensively tested; in particular, it parses almost entire Rubygems corpus. For every issue, a breakdown of affected gems is offered.

Void value expressions

Ruby MRI prohibits so-called "void value expressions". For a description of what a void value expression is, see this gist and this Parser issue.

It is unknown whether any gems are affected by this issue.

Syntax check of block exits

Similar to "void value expression" checks Ruby MRI also checks for correct usage of break, next and redo, if it's used outside of a {break,next,redo}-able context Ruby returns a syntax error starting from 3.3.0. parser gem simply doesn't run this type of checks.

It is unknown whether any gems are affected by this issue.

Invalid characters inside comments and literals

Ruby MRI permits arbitrary non-7-bit byte sequences to appear in comments, as well as in string or symbol literals in form of escape sequences, regardless of source encoding. Parser requires all source code, including the expanded escape sequences, to consist of valid byte sequences in the source encoding that are convertible to UTF-8.

As of 2013-07-25, there are about 180 affected gems.

\u escape in 1.8 mode

Ruby MRI 1.8 permits to specify a bare \u escape sequence in a string; it treats it like u. Ruby MRI 1.9 and later treat \u as a prefix for Unicode escape sequence and do not allow it to appear bare. Parser follows 1.9+ behavior.

As of 2013-07-25, affected gems are: activerdf, activerdf_net7, fastreader, gkellog-reddy.

Dollar-dash

(This one is so obscure I couldn't even think of a saner name for this issue.) Pre-2.1 Ruby allows to specify a global variable named $-. Ruby 2.1 and later treat it as a syntax error. Parser follows 2.1 behavior.

No known code is affected by this issue.

EOF characters after embedded documents before 2.7

Code like "=begin\n""=end\0" is invalid for all versions of Ruby before 2.7. Ruby 2.7 and later parses it normally. Parser follows 2.7 behavior.

It is unknown whether any gems are affected by this issue.

Contributors

Catherine whitequark
Markus Schirp (mbj)
Yorick Peterse (yorickpeterse)
Magnus Holm (judofyr)
Bozhidar Batsov (bbatsov)

Acknowledgements

The lexer testsuite is derived from ruby_parser.

The Bison parser rules are derived from Ruby MRI parse.y.

Contributing

Make sure you have Ragel ~> 6.7 installed
Fork it
Create your feature branch (git checkout -b my-new-feature)
Commit your changes (git commit -am 'Add some feature')
Push to the branch (git push origin my-new-feature)
Create new Pull Request

Top Related Projects

ruby

22,562

The Ruby Programming Language

rbs

2,035

Type Signature for Ruby

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

parser

Top Related Projects

ruby

rbs

Quick Overview

Pros

Cons

Code Examples

Getting Started

Competitor Comparisons

ruby

Pros of ruby

Cons of ruby

Code Comparison

Summary

rbs

Pros of RBS

Cons of RBS

Code Comparison

Summary

Convert designs to code with AI

README

Parser

Installation

Usage

Features

Documentation

Node names

(block)

(begin) and (kwbegin)

Backwards compatibility

Compatibility with Ruby MRI

Compatibility with MacRuby and RubyMotion

Known issues

Void value expressions

Syntax check of block exits

Invalid characters inside comments and literals

\u escape in 1.8 mode

Dollar-dash

EOF characters after embedded documents before 2.7

Contributors

Acknowledgements

Contributing

Top Related Projects

ruby

rbs

Convert designs to code with AI

NPM DownloadsLast 30 Days