Convert Figma logo to code with AI

kkos logooniguruma

regular expression library

2,301
315
2,301
3

Top Related Projects

8,931

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

3,508

An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.

48,187

ripgrep recursively searches directories for a regex pattern while respecting your gitignore

1,183

The standard library of the D programming language

High-performance regular expression matching library

Quick Overview

Oniguruma is a powerful and flexible regular expression library written in C. It supports a wide range of character encodings and provides advanced features like named capture groups and look-around assertions. Oniguruma is used in many popular projects, including Ruby's regular expression engine.

Pros

  • Supports multiple character encodings (UTF-8, UTF-16, UTF-32, etc.)
  • Offers advanced regular expression features (named captures, look-around assertions)
  • High performance and efficient memory usage
  • Well-maintained and actively developed

Cons

  • Steeper learning curve compared to simpler regex libraries
  • Documentation can be sparse or unclear in some areas
  • May be overkill for simple regex needs
  • C API can be challenging for developers not familiar with C programming

Code Examples

  1. Basic pattern matching:
#include <oniguruma.h>

const char *str = "Hello, world!";
OnigRegex regex;
OnigRegion *region;

onig_new(&regex, (UChar*)"world", (UChar*)"world" + 5, ONIG_OPTION_DEFAULT, ONIG_ENCODING_UTF8, ONIG_SYNTAX_DEFAULT, NULL);
region = onig_region_new();

if (onig_search(regex, (UChar*)str, (UChar*)(str + strlen(str)), (UChar*)str, (UChar*)(str + strlen(str)), region, ONIG_OPTION_NONE) != ONIG_MISMATCH) {
    printf("Match found!\n");
}

onig_region_free(region, 1);
onig_free(regex);
  1. Using named capture groups:
#include <oniguruma.h>

const char *str = "John Doe (30 years old)";
OnigRegex regex;
OnigRegion *region;

onig_new(&regex, (UChar*)"(?<name>\\w+ \\w+) \\((?<age>\\d+) years old\\)", (UChar*)"(?<name>\\w+ \\w+) \\((?<age>\\d+) years old\\)" + 44, ONIG_OPTION_DEFAULT, ONIG_ENCODING_UTF8, ONIG_SYNTAX_DEFAULT, NULL);
region = onig_region_new();

if (onig_search(regex, (UChar*)str, (UChar*)(str + strlen(str)), (UChar*)str, (UChar*)(str + strlen(str)), region, ONIG_OPTION_NONE) != ONIG_MISMATCH) {
    int name_index = onig_name_to_group_numbers(regex, (UChar*)"name", (UChar*)"name" + 4, NULL);
    int age_index = onig_name_to_group_numbers(regex, (UChar*)"age", (UChar*)"age" + 3, NULL);
    
    printf("Name: %.*s\n", region->end[name_index] - region->beg[name_index], str + region->beg[name_index]);
    printf("Age: %.*s\n", region->end[age_index] - region->beg[age_index], str + region->beg[age_index]);
}

onig_region_free(region, 1);
onig_free(regex);
  1. Using look-around assertions:
#include <oniguruma.h>

const char *str = "password123";
OnigRegex regex;
OnigRegion *region;

onig_new(&regex, (UChar*)"(?=.*[a-z])(?=.*\\d).{8,}", (UChar*)"(?=.*[a-z])(?=.*\\d).{8,}" + 24, ONIG_OPTION_DEFAULT, ONIG_ENCODING_UTF8, ONIG_SYNTAX_DEFAULT, NULL);
region = onig_region_new();

if (onig_search(regex, (UChar*)str, (UChar*)(str + strlen(str)), (UChar*)str,

Competitor Comparisons

8,931

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

Pros of RE2

  • Faster performance for large-scale text processing
  • Guaranteed linear time complexity, preventing catastrophic backtracking
  • Better memory efficiency, especially for large inputs

Cons of RE2

  • Limited support for advanced regex features (e.g., backreferences, lookaround assertions)
  • Less flexible syntax compared to PCRE-style engines
  • May require code changes when migrating from other regex libraries

Code Comparison

Oniguruma:

regex_t* reg;
OnigErrorInfo einfo;
int r = onig_new(&reg, pattern, pattern + strlen(pattern),
                 ONIG_OPTION_DEFAULT, ONIG_ENCODING_UTF8,
                 ONIG_SYNTAX_DEFAULT, &einfo);

RE2:

RE2 re(pattern);
if (!re.ok()) {
    // Handle error
}

Key Differences

  • Oniguruma offers more extensive regex features, including Unicode support and complex pattern matching
  • RE2 focuses on performance and safety, sacrificing some advanced regex functionality
  • Oniguruma is commonly used in scripting languages (e.g., Ruby), while RE2 is often employed in large-scale applications
  • RE2 provides a simpler API, making it easier to use for basic regex operations
  • Oniguruma's flexibility makes it suitable for a wider range of text processing tasks
3,508

An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.

Pros of regex

  • Written in Rust, offering memory safety and thread safety
  • Designed for high performance with linear-time matching for most regular expressions
  • Extensive documentation and examples for ease of use

Cons of regex

  • Limited to Rust ecosystem, not as widely portable as Oniguruma
  • May lack some advanced features found in Oniguruma, such as certain Unicode properties

Code Comparison

Oniguruma (C):

regex_t* reg;
OnigErrorInfo einfo;
int r = onig_new(&reg, pattern, pattern + strlen(pattern),
                 ONIG_OPTION_DEFAULT, ONIG_ENCODING_UTF8,
                 ONIG_SYNTAX_DEFAULT, &einfo);

regex (Rust):

use regex::Regex;

let re = Regex::new(r"pattern").unwrap();
let matches = re.is_match("test string");

Summary

Oniguruma is a C library for regular expressions with wide language support and extensive Unicode features. It's highly portable and used in many projects across different programming languages.

regex is a Rust-specific regular expression engine focused on safety and performance within the Rust ecosystem. It provides a more modern and safe API but is limited to Rust applications.

The choice between these libraries depends on the specific project requirements, target language, and desired features.

48,187

ripgrep recursively searches directories for a regex pattern while respecting your gitignore

Pros of ripgrep

  • Significantly faster performance for searching large codebases
  • User-friendly command-line interface with intuitive options
  • Built-in support for various file types and automatic encoding detection

Cons of ripgrep

  • Limited regular expression engine compared to Oniguruma
  • Not designed as a standalone library for integration into other projects
  • Lacks some advanced features found in Oniguruma, like syntax highlighting

Code Comparison

Oniguruma (C):

regex_t* reg;
OnigErrorInfo einfo;
int r = onig_new(&reg, pattern, pattern + strlen(pattern),
                 ONIG_OPTION_DEFAULT, ONIG_ENCODING_UTF8,
                 ONIG_SYNTAX_DEFAULT, &einfo);

ripgrep (Rust):

use grep_regex::RegexMatcher;
let matcher = RegexMatcher::new(pattern)?;
let printer = StandardBuilder::new().build(stdout());
searcher.search_path(&matcher, path, printer)?;

While Oniguruma provides a powerful regular expression engine as a library, ripgrep is designed as a command-line tool for fast searching. Oniguruma offers more flexibility for integration into other projects, while ripgrep excels in performance and ease of use for developers searching through codebases.

1,183

The standard library of the D programming language

Pros of Phobos

  • Comprehensive standard library for D programming language
  • Extensive range of modules covering various programming needs
  • Active development and community support

Cons of Phobos

  • Larger codebase and potentially steeper learning curve
  • Specific to D language, limiting its use in other environments

Code Comparison

Phobos (D language):

import std.regex;
auto pattern = regex("\\d+");
auto text = "123 abc 456";
auto matches = matchAll(text, pattern);

Oniguruma (C language):

#include <oniguruma.h>
regex_t* reg;
OnigRegion* region;
onig_new(&reg, (UChar*)"\\d+", (UChar*)"\\d+"+4, ONIG_OPTION_DEFAULT, ONIG_ENCODING_UTF8, ONIG_SYNTAX_DEFAULT, NULL);
region = onig_region_new();
onig_search(reg, (UChar*)"123 abc 456", (UChar*)"123 abc 456"+11, (UChar*)"123 abc 456", (UChar*)"123 abc 456"+11, region, ONIG_OPTION_NONE);

Summary

Phobos is a full-featured standard library for D, offering a wide range of functionalities beyond regular expressions. Oniguruma is a specialized regular expression library with multi-language support. Phobos provides a more high-level and integrated approach for D developers, while Oniguruma offers a lower-level, portable solution for regular expression handling across different programming languages.

High-performance regular expression matching library

Pros of Hyperscan

  • High-performance regex matching optimized for Intel architectures
  • Supports simultaneous matching of large pattern sets
  • Offers both streaming and block mode scanning

Cons of Hyperscan

  • Limited to x86 platforms, less portable than Oniguruma
  • More complex API and setup compared to Oniguruma
  • Lacks some advanced regex features found in Oniguruma

Code Comparison

Oniguruma:

regex_t* reg;
OnigRegion* region;
onig_new(&reg, pattern, pattern + strlen(pattern), ONIG_OPTION_DEFAULT, ONIG_ENCODING_UTF8, ONIG_SYNTAX_DEFAULT, &einfo);
region = onig_region_new();
onig_search(reg, str, str + strlen(str), str, str + strlen(str), region, ONIG_OPTION_NONE);

Hyperscan:

hs_database_t *database;
hs_compile_error_t *compile_err;
hs_compile(pattern, HS_FLAG_DOTALL, HS_MODE_BLOCK, NULL, &database, &compile_err);
hs_scratch_t *scratch = NULL;
hs_alloc_scratch(database, &scratch);
hs_scan(database, str, strlen(str), 0, scratch, event_handler, NULL);

Both libraries offer regex matching capabilities, but Hyperscan focuses on high-performance scanning for large pattern sets, while Oniguruma provides a more traditional regex engine with broader language support. Hyperscan's API is more complex, reflecting its specialized use cases, while Oniguruma offers a simpler interface for general-purpose regex operations.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Fuzzing Status

Oniguruma

The only open source software attacked on Google search in Japan. (Issue #234)

https://github.com/kkos/oniguruma

Oniguruma is a modern and flexible regular expressions library. It encompasses features from different regular expression implementations that traditionally exist in different languages.

Character encoding can be specified per regular expression object.

Supported character encodings:

ASCII, UTF-8, UTF-16BE, UTF-16LE, UTF-32BE, UTF-32LE, EUC-JP, EUC-TW, EUC-KR, EUC-CN, Shift_JIS, Big5, GB18030, KOI8-R, CP1251, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-10, ISO-8859-11, ISO-8859-13, ISO-8859-14, ISO-8859-15, ISO-8859-16

  • GB18030: contributed by KUBO Takehiro
  • CP1251: contributed by Byte
  • doc/SYNTAX.md: contributed by seanofw

Notice (from 6.9.6)

When using configure script, if you have the POSIX API enabled in an earlier version (disabled by default in 6.9.5) and you need application binary compatibility with the POSIX API, specify "--enable-binary-compatible-posix-api=yes" instead of "--enable-posix-api=yes". Starting in 6.9.6, "--enable-posix-api=yes" only supports source-level compatibility for 6.9.5 and earlier about POSIX API. (Issue #210)

Master branch

  • Update Unicode version 16.0
  • Add new operator (*SKIP)

Version 6.9.9

  • Update Unicode version 15.1.0
  • NEW API: ONIG_OPTION_MATCH_WHOLE_STRING
  • Fixed: (?I) option was not enabled for character classes (Issue #264).
  • Changed specification to check for incorrect POSIX bracket (Issue #253).
  • Changed [[:punct:]] in Unicode encodings to be compatible with POSIX definition. (Issue #268)
  • Fixed: ONIG_OPTION_FIND_LONGEST behavior

Version 6.9.8

  • Update Unicode version 14.0.0
  • Whole options
    • (?C) : ONIG_OPTION_DONT_CAPTURE_GROUP
    • (?I) : ONIG_OPTION_IGNORECASE_IS_ASCII
    • (?L) : ONIG_OPTION_FIND_LONGEST
  • Fixed some problems found by OSS-Fuzz

Version 6.9.7

  • NEW API: ONIG_OPTION_CALLBACK_EACH_MATCH
  • NEW API: ONIG_OPTION_IGNORECASE_IS_ASCII
  • NEW API: ONIG_SYNTAX_PYTHON
  • Fixed some problems found by OSS-Fuzz

Version 6.9.6

  • NEW: configure option --enable-binary-compatible-posix-api=[yes/no]
  • NEW API: Limiting the maximum number of calls of subexp-call
  • NEW API: ONIG_OPTION_NOT_BEGIN_STRING / NOT_END_STRING / NOT_BEGIN_POSITION
  • Fixed behavior of ONIG_OPTION_NOTBOL / NOTEOL
  • Fixed many problems found by OSS-Fuzz
  • Fixed many problems found by Coverity
  • Fixed CVE-2020-26159 (This turned out not to be a problem later. #221)
  • Under cygwin and mingw, generate and install the libonig.def file (Issue #220)

License

BSD license.

Install

Case 1: Linux distribution packages

  • Fedora: dnf install oniguruma-devel
  • RHEL/CentOS: yum install oniguruma
  • Debian/Ubuntu: apt install libonig5
  • Arch: pacman -S oniguruma
  • openSUSE: zypper install oniguruma

Case 2: Manual compilation on Linux, Unix, and Cygwin platform

  1. autoreconf -vfi (* case: configure script is not found.)

  2. ./configure

  3. make

  4. make install

  • uninstall

    make uninstall

  • configuration check

    onig-config --cflags onig-config --libs onig-config --prefix onig-config --exec-prefix

Case 3: Windows 64/32bit platform (Visual Studio)

  • build library

    .\make_win.bat

    onig_s.lib: static link library onig.dll: dynamic link library

  • make test programs

    .\make_win.bat all-test

Alternatively, you can build and install oniguruma using vcpkg dependency manager:

  1. git clone https://github.com/Microsoft/vcpkg.git
  2. cd vcpkg
  3. ./bootstrap-vcpkg.bat
  4. ./vcpkg integrate install
  5. ./vcpkg install oniguruma

The oniguruma port in vcpkg is kept up to date by microsoft team members and community contributors. If the version is out of date, please create an issue or pull request on the vcpkg repository.

Regular Expressions

See doc/RE or doc/RE.ja for Japanese.

Usage

Include oniguruma.h in your program. (Oniguruma API) See doc/API for Oniguruma API.

If you want to disable UChar type (== unsigned char) definition in oniguruma.h, define ONIG_ESCAPE_UCHAR_COLLISION and then include oniguruma.h.

If you want to disable regex_t type definition in oniguruma.h, define ONIG_ESCAPE_REGEX_T_COLLISION and then include oniguruma.h.

Example of the compiling/linking command line in Unix or Cygwin, (prefix == /usr/local case)

cc sample.c -L/usr/local/lib -lonig

If you want to use static link library(onig_s.lib) in Win32, add option -DONIG_EXTERN=extern to C compiler.

Sample Programs

FileDescription
sample/callout.cexample of callouts
sample/count.cexample of built-in callout *COUNT
sample/echo.cexample of user defined callouts of name
sample/encode.cexample of some encodings
sample/listcap.cexample of the capture history
sample/names.cexample of the named group callback
sample/posix.cPOSIX API sample
sample/regset.cexample of using RegSet API
sample/scan.cexample of using onig_scan()
sample/simple.cexample of the minimum (Oniguruma API)
sample/sql.cexample of the variable meta characters
sample/user_property.cexample of user defined Unicode property

Test Programs

FileDescription
sample/syntax.cPerl, Java and ASIS syntax test.
sample/crnl.c--enable-crnl-as-line-terminator test

Source Files

FileDescription
oniguruma.hOniguruma API header file (public)
onig-config.inconfiguration check program template
regenc.hcharacter encodings framework header file
regint.hinternal definitions
regparse.hinternal definitions for regparse.c and regcomp.c
regcomp.ccompiling and optimization functions
regenc.ccharacter encodings framework
regerror.cerror message function
regext.cextended API functions (deluxe version API)
regexec.csearch and match functions
regparse.cparsing functions.
regsyntax.cpattern syntax functions and built-in syntax definitions
regtrav.ccapture history tree data traverse functions
regversion.cversion info function
st.hhash table functions header file
st.chash table functions
oniggnu.hGNU regex API header file (public)
reggnu.cGNU regex API functions
onigposix.hPOSIX API header file (public)
regposerr.cPOSIX error message function
regposix.cPOSIX API functions
mktable.ccharacter type table generator
ascii.cASCII encoding
euc_jp.cEUC-JP encoding
euc_tw.cEUC-TW encoding
euc_kr.cEUC-KR, EUC-CN encoding
sjis.cShift_JIS encoding
big5.cBig5 encoding
gb18030.cGB18030 encoding
koi8.cKOI8 encoding
koi8_r.cKOI8-R encoding
cp1251.cCP1251 encoding
iso8859_1.cISO-8859-1 (Latin-1)
iso8859_2.cISO-8859-2 (Latin-2)
iso8859_3.cISO-8859-3 (Latin-3)
iso8859_4.cISO-8859-4 (Latin-4)
iso8859_5.cISO-8859-5 (Cyrillic)
iso8859_6.cISO-8859-6 (Arabic)
iso8859_7.cISO-8859-7 (Greek)
iso8859_8.cISO-8859-8 (Hebrew)
iso8859_9.cISO-8859-9 (Latin-5 or Turkish)
iso8859_10.cISO-8859-10 (Latin-6 or Nordic)
iso8859_11.cISO-8859-11 (Thai)
iso8859_13.cISO-8859-13 (Latin-7 or Baltic Rim)
iso8859_14.cISO-8859-14 (Latin-8 or Celtic)
iso8859_15.cISO-8859-15 (Latin-9 or West European with Euro)
iso8859_16.cISO-8859-16 (Latin-10)
utf8.cUTF-8 encoding
utf16_be.cUTF-16BE encoding
utf16_le.cUTF-16LE encoding
utf32_be.cUTF-32BE encoding
utf32_le.cUTF-32LE encoding
unicode.ccommon codes of Unicode encoding
unicode_fold_data.cUnicode folding data
windows/testc.cTest program for Windows (VC++)