Convert Figma logo to code with AI

intel logohyperscan

High-performance regular expression matching library

4,806
716
4,806
182

Top Related Projects

8,931

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

3,508

An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.

48,187

ripgrep recursively searches directories for a regex pattern while respecting your gitignore

A code-searching tool similar to ack, but faster.

A fast implementation of Aho-Corasick in Rust.

Fast, indexed regexp search over large file trees

Quick Overview

Hyperscan is a high-performance multiple regex matching library developed by Intel. It is designed for applications that need to scan large amounts of data quickly, such as network security tools, data loss prevention systems, and content filtering software. Hyperscan supports simultaneous matching of up to tens of thousands of regular expressions.

Pros

  • Extremely fast pattern matching, optimized for modern CPU architectures
  • Supports both streaming and block mode scanning
  • Provides both literal and regular expression pattern matching
  • Cross-platform compatibility (Linux, FreeBSD, macOS, and Windows)

Cons

  • Limited support for certain regex features (e.g., backreferences, lookaround assertions)
  • Requires significant memory for pattern compilation, especially with large rule sets
  • Learning curve for optimal usage and integration
  • Not suitable for all types of pattern matching tasks (e.g., text processing)

Code Examples

  1. Basic pattern matching:
#include <hs.h>

const char *pattern = "foo.*bar";
hs_database_t *database;
hs_compile_error_t *compile_err;

if (hs_compile(pattern, HS_FLAG_DOTALL, HS_MODE_BLOCK, NULL, &database, &compile_err) != HS_SUCCESS) {
    fprintf(stderr, "Failed to compile pattern: %s\n", compile_err->message);
    hs_free_compile_error(compile_err);
    return 1;
}
  1. Scanning data:
static int eventHandler(unsigned int id, unsigned long long from,
                        unsigned long long to, unsigned int flags, void *ctx) {
    printf("Match found at offset %llu\n", from);
    return 0;
}

const char *data = "This is a test string with fooxxxbar in it";
hs_scratch_t *scratch = NULL;

if (hs_alloc_scratch(database, &scratch) != HS_SUCCESS) {
    fprintf(stderr, "Failed to allocate scratch space\n");
    hs_free_database(database);
    return 1;
}

if (hs_scan(database, data, strlen(data), 0, scratch, eventHandler, NULL) != HS_SUCCESS) {
    fprintf(stderr, "Error scanning data\n");
}
  1. Streaming mode:
hs_stream_t *stream = NULL;

if (hs_open_stream(database, 0, &stream) != HS_SUCCESS) {
    fprintf(stderr, "Failed to open stream\n");
    return 1;
}

const char *chunk1 = "This is the first";
const char *chunk2 = " chunk of data";

hs_scan_stream(stream, chunk1, strlen(chunk1), 0, scratch, eventHandler, NULL);
hs_scan_stream(stream, chunk2, strlen(chunk2), 0, scratch, eventHandler, NULL);
hs_close_stream(stream, scratch, eventHandler, NULL);

Getting Started

  1. Install Hyperscan:

    git clone https://github.com/intel/hyperscan.git
    cd hyperscan
    mkdir build && cd build
    cmake -DCMAKE_INSTALL_PREFIX=/usr/local ..
    make
    sudo make install
    
  2. Include Hyperscan in your project:

    #include <hs.h>
    
  3. Compile with Hyperscan:

    gcc -o your_program your_program.c -lhs
    
  4. Run your program:

    ./your_program
    

Competitor Comparisons

8,931

RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

Pros of RE2

  • Guaranteed linear time and space complexity, making it suitable for processing untrusted input
  • Cross-platform support (Windows, macOS, Linux) and easy integration with C++ projects
  • Extensive documentation and well-maintained codebase

Cons of RE2

  • Limited support for advanced regex features like backreferences and lookaround assertions
  • May be slower than Hyperscan for certain types of patterns or large-scale matching tasks
  • Primarily designed for C++ usage, with limited official bindings for other languages

Code Comparison

RE2:

RE2 pattern("\\w+");
re2::StringPiece input("Hello, world!");
string word;
while (RE2::FindAndConsume(&input, pattern, &word)) {
    cout << word << endl;
}

Hyperscan:

hs_database_t *database;
hs_compile_error_t *compile_err;
hs_compile("\\w+", 0, HS_FLAG_DOTALL, NULL, &database, &compile_err);
hs_scratch_t *scratch;
hs_alloc_scratch(database, &scratch);
hs_scan(database, "Hello, world!", 13, 0, scratch, event_handler, NULL);

Both libraries offer efficient regex matching capabilities, but they cater to different use cases. RE2 focuses on safety and predictability, while Hyperscan emphasizes high-performance pattern matching for large-scale applications.

3,508

An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.

Pros of regex

  • Written in Rust, offering memory safety and thread safety
  • Supports Unicode out of the box
  • Simpler API, easier to use for basic regex operations

Cons of regex

  • Generally slower performance compared to Hyperscan
  • Limited support for advanced pattern matching features
  • Not optimized for high-speed scanning of large data sets

Code Comparison

regex:

use regex::Regex;

let re = Regex::new(r"^\d{4}-\d{2}-\d{2}$").unwrap();
assert!(re.is_match("2014-01-01"));

Hyperscan:

#include <hs.h>

hs_database_t *database;
hs_compile_error_t *compile_err;
if (hs_compile("^\\d{4}-\\d{2}-\\d{2}$", 0, HS_MODE_BLOCK, NULL, &database, &compile_err) != HS_SUCCESS) {
    // Handle error
}

The regex example shows a simpler API for basic pattern matching, while the Hyperscan example demonstrates its more complex setup but offers potential for higher performance in large-scale scanning scenarios. Hyperscan provides more advanced features and optimizations for high-speed pattern matching, particularly useful in network security and content inspection applications. However, regex offers a more user-friendly interface for common regex tasks and benefits from Rust's safety features.

48,187

ripgrep recursively searches directories for a regex pattern while respecting your gitignore

Pros of ripgrep

  • Faster and more efficient for general-purpose text searching
  • User-friendly command-line interface with intuitive options
  • Cross-platform support (Windows, macOS, Linux)

Cons of ripgrep

  • Limited to regex pattern matching, lacks advanced pattern matching features
  • Not optimized for high-speed pattern matching in network traffic

Code comparison

ripgrep:

use regex::Regex;

fn main() {
    let re = Regex::new(r"pattern").unwrap();
    // Search logic here
}

Hyperscan:

#include <hs.h>

int main() {
    hs_database_t *database;
    hs_compile_error_t *compile_err;
    // Compile and scan logic here
}

Key differences

  • ripgrep is a command-line search tool, while Hyperscan is a regex matching library
  • Hyperscan focuses on high-performance pattern matching for network security applications
  • ripgrep is written in Rust, while Hyperscan is implemented in C
  • Hyperscan supports advanced features like stream matching and multi-pattern scanning
  • ripgrep is more suitable for searching files and directories, while Hyperscan excels in real-time network traffic analysis

Use cases

ripgrep:

  • Code searching and refactoring
  • Log file analysis
  • General-purpose text searching in files and directories

Hyperscan:

  • Network intrusion detection systems
  • Deep packet inspection
  • Content filtering and malware detection

A code-searching tool similar to ack, but faster.

Pros of The Silver Searcher

  • Simpler to use and install, with a focus on command-line searching
  • Faster for typical file searching tasks on a local filesystem
  • Supports ignore files and directory exclusions out of the box

Cons of The Silver Searcher

  • Limited to simple pattern matching, lacking advanced regex features
  • Not designed for high-performance streaming or network applications
  • Less suitable for integration into other software as a library

Code Comparison

The Silver Searcher:

void search_dir(ignores *ig, const char *base_path, const char *path, const int depth) {
    DIR *dir;
    struct dirent *dir_entry;
    char *dir_full_path = NULL;
    size_t path_length = 0;
    // ... (implementation details)
}

Hyperscan:

hs_error_t hs_compile(const char *expression,
                      unsigned int flags,
                      unsigned int mode,
                      const hs_platform_info_t *platform,
                      hs_database_t **db,
                      hs_compile_error_t **error) {
    // ... (implementation details)
}

The Silver Searcher is focused on file system traversal and simple pattern matching, while Hyperscan provides a more complex API for high-performance regular expression matching, suitable for integration into larger systems.

A fast implementation of Aho-Corasick in Rust.

Pros of aho-corasick

  • Pure Rust implementation, making it easy to integrate into Rust projects
  • Simpler API and easier to use for basic pattern matching tasks
  • Lightweight and has fewer dependencies

Cons of aho-corasick

  • Limited to Aho-Corasick algorithm, while Hyperscan supports multiple algorithms
  • Generally slower performance compared to Hyperscan's optimized C implementation
  • Lacks advanced features like stream matching and vectorized processing

Code Comparison

aho-corasick:

use aho_corasick::AhoCorasick;

let patterns = &["apple", "maple", "Snapple"];
let ac = AhoCorasick::new(patterns);
let matches: Vec<_> = ac.find_iter("Snapple is a type of apple juice").collect();

Hyperscan:

#include <hs/hs.h>

const char *patterns[] = {"apple", "maple", "Snapple"};
hs_database_t *database;
hs_compile_multi(patterns, NULL, NULL, 3, HS_MODE_BLOCK, NULL, &database, NULL);
hs_scratch_t *scratch;
hs_alloc_scratch(database, &scratch);
hs_scan(database, "Snapple is a type of apple juice", 32, 0, scratch, match_event_handler, NULL);

Both libraries provide efficient pattern matching capabilities, but Hyperscan offers more advanced features and generally better performance, while aho-corasick is simpler to use and integrate into Rust projects.

Fast, indexed regexp search over large file trees

Pros of CodeSearch

  • Simpler and more lightweight implementation
  • Designed specifically for code search, optimized for this use case
  • Easier to integrate into existing projects due to its focused functionality

Cons of CodeSearch

  • Less versatile compared to Hyperscan's general-purpose regex matching
  • Limited to indexing and searching code repositories
  • May not perform as well for large-scale, high-performance applications

Code Comparison

CodeSearch (Go):

func (ix *Index) PostingQuery(q query.Q) (*PostingQuery, error) {
    return ix.postingQuery(q, nil, false)
}

Hyperscan (C):

hs_error_t hs_compile(const char *expression,
                      unsigned int flags,
                      unsigned int mode,
                      const hs_platform_info_t *platform,
                      hs_database_t **db,
                      hs_compile_error_t **error)

CodeSearch is implemented in Go and focuses on indexing and searching code, while Hyperscan is written in C and provides a more general-purpose regex matching engine. The code snippets highlight the difference in complexity and scope between the two projects.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

Hyperscan

Hyperscan is a high-performance multiple regex matching library. It follows the regular expression syntax of the commonly-used libpcre library, but is a standalone library with its own C API.

Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions and for the matching of regular expressions across streams of data.

Hyperscan is typically used in a DPI library stack.

Documentation

Information on building the Hyperscan library and using its API is available in the Developer Reference Guide.

License

Hyperscan is licensed under the BSD License. See the LICENSE file in the project repository.

Versioning

The master branch on Github will always contain the most recent release of Hyperscan. Each version released to master goes through QA and testing before it is released; if you're a user, rather than a developer, this is the version you should be using.

Further development towards the next release takes place on the develop branch.

Get Involved

The official homepage for Hyperscan is at www.hyperscan.io.

If you have questions or comments, we encourage you to join the mailing list. Bugs can be filed by sending email to the list, or by creating an issue on Github.

If you wish to contact the Hyperscan team at Intel directly, without posting publicly to the mailing list, send email to hyperscan@intel.com.