Convert Figma logo to code with AI

lotabout logowrite-a-C-interpreter

Write a simple interpreter of C. Inspired by c4 and largely based on it.

4,006
743
4,006
10

Top Related Projects

9,633

C in four functions

10,422

A Compiler Writing Journey

Quick Overview

Write-a-C-interpreter is an educational project that guides users through the process of building a simple C interpreter from scratch. It aims to help developers understand the inner workings of programming languages and compilers by implementing a basic C interpreter in C.

Pros

  • Provides a hands-on learning experience for understanding language interpretation and compilation
  • Offers step-by-step explanations and implementation details
  • Helps developers gain insights into low-level programming concepts
  • Serves as a foundation for further exploration of compiler design and language implementation

Cons

  • Limited to a subset of the C language, not a full-featured interpreter
  • May not be suitable for production use or real-world applications
  • Requires a solid understanding of C programming and computer science concepts
  • Documentation could be more comprehensive for beginners

Code Examples

Here are a few examples of code snippets from the project:

  1. Tokenization:
void next() {
    char *last_pos;
    int hash;

    while (token = *src) {
        ++src;

        if (token == '\n') {
            ++line;
        }
        else if (token == '#') {
            // skip macro, because we will not support it
            while (*src != 0 && *src != '\n') {
                src++;
            }
        }
        else if ((token >= 'a' && token <= 'z') || (token >= 'A' && token <= 'Z') || (token == '_')) {
            // parse identifier
            last_pos = src - 1;
            hash = token;

            while ((*src >= 'a' && *src <= 'z') || (*src >= 'A' && *src <= 'Z') || (*src >= '0' && *src <= '9') || (*src == '_')) {
                hash = hash * 147 + *src;
                src++;
            }

            // look for existing identifier, linear search
            current_id = symbols;
            while (current_id[Token]) {
                if (current_id[Hash] == hash && !memcmp((char *)current_id[Name], last_pos, src - last_pos)) {
                    //found one, return
                    token = current_id[Token];
                    return;
                }
                current_id = current_id + IdSize;
            }

            // store new ID
            current_id[Name] = (int)last_pos;
            current_id[Hash] = hash;
            token = current_id[Token] = Id;
            return;
        }
        else if (token >= '0' && token <= '9') {
            // parse number, three kinds: dec(123) hex(0x123) oct(017)
            token_val = token - '0';
            if (token_val > 0) {
                // dec, starts with [1-9]
                while (*src >= '0' && *src <= '9') {
                    token_val = token_val*10 + *src++ - '0';
                }
            } else {
                // starts with 0
                if (*src == 'x' || *src == 'X') {
                    //hex
                    token = *++src;
                    while ((token >= '0' && token <= '9') || (token >= 'a' && token <= 'f') || (token >= 'A' && token <= 'F')) {
                        token_val = token_val * 16 + (token & 15) + (token >= 'A' ? 9 : 0);
                        token = *++src;
                    }
                } else {
                    // oct
                    while (*src >= '0' && *src <= '7') {
                        token_val = token_val*8 + *src++ - '0';
                    }
                }
            }
            token = Num;
            return;
        }
        // ... (more token parsing)
    }
}

This code snippet demonstrates the tokenization process, parsing identifiers and numbers from the input source code.

  1. Expression parsing:
void expression(int level) {
    // parse expressions
    int *i

Competitor Comparisons

9,633

C in four functions

Pros of c4

  • Extremely concise codebase (less than 1000 lines)
  • Self-hosting capability
  • Minimalistic approach, easier to understand the core concepts

Cons of c4

  • Limited feature set compared to write-a-C-interpreter
  • Less detailed documentation and explanations
  • Fewer comments in the code, potentially harder for beginners to follow

Code Comparison

write-a-C-interpreter:

void expression(int level) {
    // ... (omitted for brevity)
    switch (*token) {
        case '!':
            match('!');
            *++text = (expr_type == CHAR) ? SC : SI;
            expression(Inc);
            *++text = PSH;
            *++text = IMM;
            *++text = 0;
            *++text = EQ;
            expr_type = INT;
            break;
        // ... (other cases)
    }
}

c4:

void expr(int lev) {
    int t, *d;
    switch (tk) {
        case '!': next(); expr(Inc); *++e = PSH; *++e = IMM; *++e = 0; *++e = EQ; ty = INT; break;
        // ... (other cases)
    }
}

The c4 implementation is more compact, using shorter variable names and condensing multiple lines into one. write-a-C-interpreter provides more verbose and readable code with additional comments and explanations.

10,422

A Compiler Writing Journey

Pros of acwj

  • More comprehensive coverage of compiler construction topics, including lexing, parsing, and code generation
  • Incremental approach with 65 steps, allowing for a gradual learning experience
  • Includes assembly code generation for a real architecture (x86-64)

Cons of acwj

  • Larger project scope may be overwhelming for beginners
  • Focuses on a subset of C, not a full C interpreter
  • Less emphasis on the interpretation aspect compared to write-a-C-interpreter

Code Comparison

write-a-C-interpreter:

void next() {
    char *last_pos;
    int hash;

    while (token = *src) {
        ++src;

acwj:

static int next(void) {
  int c;

  if (Putback) {		// Use the character put
    c = Putback;		// back if there is one
    Putback = 0;
    return c;
  }

The code snippets show different approaches to tokenization. write-a-C-interpreter uses a single function to handle all token types, while acwj separates concerns with multiple functions for different token types.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

C interpreter that interprets itself.

How to Run the Code

File xc.c is the original one and xc-tutor.c is the one that I make for the tutorial step by step.

gcc -o xc xc.c
./xc hello.c
./xc -s hello.c

./xc xc.c hello.c
./xc xc.c xc.c hello.c

About

This project is inspired by c4 and is largely based on it.

However, I rewrote them all to make it more understandable and help myself to understand it.

Despite the complexity we saw in books about compiler design, writing one is not that hard. You don't need that much theory though they will help for better understanding the logic behind the code.

Also I write a series of article about how this compiler is built under directory tutorial/en.

There is also a chinese version in my blog.

  1. 手把手教你构建 C 语言编译器(0)——前言
  2. 手把手教你构建 C 语言编译器(1)——设计
  3. 手把手教你构建 C 语言编译器(2)——虚拟机
  4. 手把手教你构建 C 语言编译器(3)——词法分析器
  5. 手把手教你构建 C 语言编译器(4)——递归下降
  6. 手把手教你构建 C 语言编译器(5)——变量定义
  7. 手把手教你构建 C 语言编译器(6)——函数定义
  8. 手把手教你构建 C 语言编译器(7)——语句
  9. 手把手教你构建 C 语言编译器(8)——表达式
  10. 手把手教你构建 C 语言编译器(9)——总结

Resources

Further Reading:

Forks:

Licence

The original code is licenced with GPL2, so this code will use the same licence.