write-a-C-interpreter

Write a simple interpreter of C. Inspired by c4 and largely based on it.

4,233

753

4,233

Top Related Projects

Quick Overview

Write-a-C-interpreter is an educational project that guides users through the process of building a simple C interpreter from scratch. It aims to help developers understand the inner workings of programming languages and compilers by implementing a basic C interpreter in C.

Pros

Provides a hands-on learning experience for understanding language interpretation and compilation
Offers step-by-step explanations and implementation details
Helps developers gain insights into low-level programming concepts
Serves as a foundation for further exploration of compiler design and language implementation

Cons

Limited to a subset of the C language, not a full-featured interpreter
May not be suitable for production use or real-world applications
Requires a solid understanding of C programming and computer science concepts
Documentation could be more comprehensive for beginners

Code Examples

Here are a few examples of code snippets from the project:

Tokenization:

void next() {
    char *last_pos;
    int hash;

    while (token = *src) {
        ++src;

        if (token == '\n') {
            ++line;
        }
        else if (token == '#') {
            // skip macro, because we will not support it
            while (*src != 0 && *src != '\n') {
                src++;
            }
        }
        else if ((token >= 'a' && token <= 'z') || (token >= 'A' && token <= 'Z') || (token == '_')) {
            // parse identifier
            last_pos = src - 1;
            hash = token;

            while ((*src >= 'a' && *src <= 'z') || (*src >= 'A' && *src <= 'Z') || (*src >= '0' && *src <= '9') || (*src == '_')) {
                hash = hash * 147 + *src;
                src++;
            }

            // look for existing identifier, linear search
            current_id = symbols;
            while (current_id[Token]) {
                if (current_id[Hash] == hash && !memcmp((char *)current_id[Name], last_pos, src - last_pos)) {
                    //found one, return
                    token = current_id[Token];
                    return;
                }
                current_id = current_id + IdSize;
            }

            // store new ID
            current_id[Name] = (int)last_pos;
            current_id[Hash] = hash;
            token = current_id[Token] = Id;
            return;
        }
        else if (token >= '0' && token <= '9') {
            // parse number, three kinds: dec(123) hex(0x123) oct(017)
            token_val = token - '0';
            if (token_val > 0) {
                // dec, starts with [1-9]
                while (*src >= '0' && *src <= '9') {
                    token_val = token_val*10 + *src++ - '0';
                }
            } else {
                // starts with 0
                if (*src == 'x' || *src == 'X') {
                    //hex
                    token = *++src;
                    while ((token >= '0' && token <= '9') || (token >= 'a' && token <= 'f') || (token >= 'A' && token <= 'F')) {
                        token_val = token_val * 16 + (token & 15) + (token >= 'A' ? 9 : 0);
                        token = *++src;
                    }
                } else {
                    // oct
                    while (*src >= '0' && *src <= '7') {
                        token_val = token_val*8 + *src++ - '0';
                    }
                }
            }
            token = Num;
            return;
        }
        // ... (more token parsing)
    }
}

This code snippet demonstrates the tokenization process, parsing identifiers and numbers from the input source code.

Expression parsing:

void expression(int level) {
    // parse expressions
    int *i

Competitor Comparisons

c4

10,211

C in four functions

Pros of c4

Extremely concise codebase (less than 1000 lines)
Self-hosting capability
Minimalistic approach, easier to understand the core concepts

Cons of c4

Limited feature set compared to write-a-C-interpreter
Less detailed documentation and explanations
Fewer comments in the code, potentially harder for beginners to follow

Code Comparison

write-a-C-interpreter:

void expression(int level) {
    // ... (omitted for brevity)
    switch (*token) {
        case '!':
            match('!');
            *++text = (expr_type == CHAR) ? SC : SI;
            expression(Inc);
            *++text = PSH;
            *++text = IMM;
            *++text = 0;
            *++text = EQ;
            expr_type = INT;
            break;
        // ... (other cases)
    }
}

c4:

void expr(int lev) {
    int t, *d;
    switch (tk) {
        case '!': next(); expr(Inc); *++e = PSH; *++e = IMM; *++e = 0; *++e = EQ; ty = INT; break;
        // ... (other cases)
    }
}

The c4 implementation is more compact, using shorter variable names and condensing multiple lines into one. write-a-C-interpreter provides more verbose and readable code with additional comments and explanations.

acwj

11,823

A Compiler Writing Journey

Pros of acwj

More comprehensive coverage of compiler construction topics, including lexing, parsing, and code generation
Incremental approach with 65 steps, allowing for a gradual learning experience
Includes assembly code generation for a real architecture (x86-64)

Cons of acwj

Larger project scope may be overwhelming for beginners
Focuses on a subset of C, not a full C interpreter
Less emphasis on the interpretation aspect compared to write-a-C-interpreter

Code Comparison

write-a-C-interpreter:

void next() {
    char *last_pos;
    int hash;

    while (token = *src) {
        ++src;

acwj:

static int next(void) {
  int c;

  if (Putback) {		// Use the character put
    c = Putback;		// back if there is one
    Putback = 0;
    return c;
  }

The code snippets show different approaches to tokenization. write-a-C-interpreter uses a single function to handle all token types, while acwj separates concerns with multiple functions for different token types.

Convert designs to code with AI

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

C interpreter that interprets itself.

How to Run the Code

File xc.c is the original one and xc-tutor.c is the one that I make for the tutorial step by step.

gcc -o xc xc.c
./xc hello.c
./xc -s hello.c

./xc xc.c hello.c
./xc xc.c xc.c hello.c

About

This project is inspired by c4 and is largely based on it.

However, I rewrote them all to make it more understandable and help myself to understand it.

Despite the complexity we saw in books about compiler design, writing one is not that hard. You don't need that much theory though they will help for better understanding the logic behind the code.

Also I write a series of article about how this compiler is built under directory tutorial/en.

There is also a chinese version in my blog.

Resources

Licence

The original code is licenced with GPL2, so this code will use the same licence.