Convert Figma logo to code with AI

rui314 logo9cc

A Small C Compiler

1,806
151
1,806
13

Top Related Projects

178,031

Linux kernel source tree

122,720

The Go programming language

62,176

The Python programming language

96,644

Empowering everyone to build reliable and efficient software.

Quick Overview

The rui314/9cc repository is a simple C compiler written in C. It is a learning project that aims to implement a basic C compiler from scratch, following the "Let's Build a Compiler" series by Jack Crenshaw.

Pros

  • Educational Value: The project is an excellent learning resource for those interested in understanding the inner workings of a compiler.
  • Simplicity: The compiler is designed to be simple and easy to understand, making it a great starting point for those new to compiler development.
  • Modular Design: The project is structured in a modular way, making it easy to extend or modify the compiler's functionality.
  • Active Development: The project is actively maintained, with regular updates and improvements.

Cons

  • Limited Functionality: The 9cc compiler only supports a subset of the C language, and may not be suitable for real-world production use.
  • Lack of Optimization: The compiler does not currently include any optimization features, which can impact the performance of the generated code.
  • Limited Documentation: While the project has some documentation, it may not be comprehensive or up-to-date for all users.
  • Steep Learning Curve: Compiler development can be a complex topic, and the project may have a steep learning curve for those new to the field.

Code Examples

Here are a few examples of the code in the 9cc repository:

  1. Lexer:
int next_token() {
    while (1) {
        if (isspace(*p)) {
            p++;
            continue;
        }

        if (strncmp(p, "return", 6) == 0 && !is_alnum(p[6])) {
            p += 6;
            return RETURN;
        }

        if (strncmp(p, "if", 2) == 0 && !is_alnum(p[2])) {
            p += 2;
            return IF;
        }

        // ... more token types
    }
}

This code implements the lexer, which is responsible for breaking the input code into a sequence of tokens.

  1. Parser:
Node *stmt() {
    Node *node;

    if (consume(RETURN)) {
        node = new_node(ND_RETURN);
        node->lhs = expr();
        return node;
    }

    if (consume(IF)) {
        node = new_node(ND_IF);
        expect('(');
        node->lhs = expr();
        expect(')');
        node->rhs = stmt();
        if (consume(ELSE)) {
            node->third = stmt();
        }
        return node;
    }

    // ... more statement types
}

This code implements the parser, which is responsible for parsing the sequence of tokens and building an abstract syntax tree (AST).

  1. Code Generation:
void gen(Node *node) {
    switch (node->kind) {
    case ND_NUM:
        printf("  push %d\n", node->val);
        return;
    case ND_LVAR:
        printf("  push %%%s\n", node->name);
        printf("  pop rax\n");
        printf("  mov rax, [rax]\n");
        printf("  push rax\n");
        return;
    case ND_ASSIGN:
        gen(node->lhs);
        gen(node->rhs);
        printf("  pop rdi\n");
        printf("  pop rax\n");
        printf("  mov [rax], rdi\n");
        printf("  push rdi\n");
        return;
    // ... more node types
    }
}

This code implements the code generation, which is responsible for generating assembly code from the AST.

Getting Started

To get started with the 9cc compiler, follow these steps:

  1. Clone the repository:
git clone https://github.com/rui314/9cc.git
  1. Change to the project directory:
cd 9cc
  1. Compile the compiler:
make
  1. Test the compiler:
./

Competitor Comparisons

178,031

Linux kernel source tree

Pros of Linux

  • Highly scalable and supports a wide range of hardware platforms, from embedded systems to supercomputers.
  • Extensive community support and a vast ecosystem of open-source software and tools.
  • Robust security features and a focus on stability and reliability.

Cons of Linux

  • Steeper learning curve compared to some other operating systems, especially for beginners.
  • Limited support for certain proprietary software and drivers, which can be a drawback for some users.
  • Fragmentation across different distributions and desktop environments, which can lead to compatibility issues.

Code Comparison

Here's a brief comparison of the code structure between 9cc and the Linux kernel:

9cc

int main() {
  printf("Hello, world!\n");
  return 0;
}

Linux Kernel

int __init kernel_init(void)
{
    int ret;

    kernel_init_freeable();
    system_state = SYSTEM_RUNNING;
    numa_default_policy();

    ret = kernel_apply_early_params();
    if (ret)
        return ret;

    do_basic_setup();
    return 0;
}

As you can see, the Linux kernel code is significantly more complex, reflecting the scale and complexity of the operating system it represents, compared to the simple "Hello, world!" program in 9cc.

122,720

The Go programming language

Pros of Go

  • Go is a mature and widely-adopted programming language, with a large and active community.
  • The Go standard library provides a comprehensive set of tools and packages for a wide range of tasks.
  • Go is known for its simplicity, efficiency, and ease of use, making it a popular choice for system programming, network programming, and cloud-based applications.

Cons of Go

  • Go's type system is less expressive than that of some other languages, which can make certain types of programming more challenging.
  • The Go ecosystem may not have as many third-party libraries and tools as some other popular languages.
  • Go's concurrency model, while powerful, can be more complex to understand and use than some other approaches.

Code Comparison

Here's a simple example of a "Hello, World!" program in both 9cc and Go:

9cc:

#include <stdio.h>

int main() {
    printf("Hello, World!\n");
    return 0;
}

Go:

package main

import "fmt"

func main() {
    fmt.Println("Hello, World!")
}

The Go version is slightly more concise, but both achieve the same basic functionality.

62,176

The Python programming language

Pros of CPython

  • CPython is the official and most widely used implementation of the Python programming language.
  • It has a large and active community, with extensive documentation and a vast ecosystem of libraries and tools.
  • CPython is highly optimized and provides excellent performance for a wide range of use cases.

Cons of CPython

  • CPython is primarily written in C, which can make it more complex to understand and contribute to compared to a simpler C implementation like 9cc.
  • The CPython codebase is significantly larger and more complex than 9cc, which may make it more challenging for new contributors to get started.
  • CPython's focus on compatibility and feature-richness can sometimes come at the expense of simplicity and ease of use.

Code Comparison

9cc:

int main() {
  printf("Hello, world!\n");
  return 0;
}

CPython:

int
main(int argc, char **argv)
{
    wchar_t *program = Py_DecodeLocale(argv[0], NULL);
    if (program == NULL) {
        fprintf(stderr, "Fatal error: cannot decode argv[0]\n");
        exit(1);
    }
    Py_SetProgramName(program);
    Py_Initialize();
    PySys_SetArgvEx(argc, argv, 0);
    Py_Finalize();
    PyMem_RawFree(program);
    return 0;
}
96,644

Empowering everyone to build reliable and efficient software.

Pros of Rust

  • Rust is a full-featured, general-purpose programming language, while 9cc is a simple C compiler.
  • Rust has a strong focus on safety, concurrency, and performance, making it a popular choice for systems programming and low-level applications.
  • Rust has a large and active community, with a wealth of libraries and tools available.

Cons of Rust

  • Rust has a steeper learning curve compared to 9cc, which is a relatively simple and straightforward C compiler.
  • The Rust compiler can be slower and more resource-intensive than the 9cc compiler.
  • Rust's extensive feature set and complexity may be overkill for some use cases where a simpler solution like 9cc would suffice.

Code Comparison

Rust:

fn main() {
    println!("Hello, world!");
}

9cc:

#include <stdio.h>

int main() {
    printf("Hello, world!\n");
    return 0;
}

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

9cc C compiler

Note: 9cc is no longer an active project, and the successor is chibicc.

9cc is a successor of my 8cc C compiler. In this new project, I'm trying to write code that can be understood extremely easily while creating a compiler that generates reasonably efficient assembly.

9cc has more stages than 8cc. Here is an overview of the internals:

  1. Compiles an input string to abstract syntax trees.
  2. Runs a semantic analyzer on the trees to add a type to each tree node.
  3. Converts the trees to intermediate code (IR), which in some degree resembles x86-64 instructions but has an infinite number of registers.
  4. Maps an infinite number of registers to a finite number of registers.
  5. Generates x86-64 instructions from the IR.

There are a few important design choices that I made to keep the code as simple as I can get:

  • Like 8cc, no memory management is the memory management policy in 9cc. We allocate memory using malloc() but never call free(). I know that people find the policy odd, but this is actually a reasonable design choice for short-lived programs such as compilers. This policy greatly simplifies code and also eliminates use-after-free bugs entirely.

  • 9cc's parser is a hand-written recursive descendent parser, so that the compiler doesn't have any blackbox such as lex/yacc.

  • I stick with plain old tools such as Make or shell script so that you don't need to learn about new stuff other than the compiler source code itself.

  • We use brute force if it makes code simpler. We don't try too hard to implement sophisticated data structures to make the compiler run faster. If the performance becomes a problem, we can fix it at that moment.

  • Entire contents are loaded into memory at once if it makes code simpler. We don't use character IO to read from an input file; instead, we read an entire file to a char array in a batch. Likewise, we tokenize a whole file in a batch rather than doing it concurrently with the parser.

Overall, 9cc is still in its very early stage. I hope to continue improving it to the point where 9cc can compile real-world C programs such as Linux kernel. That is an ambitious goal, but I believe it's achievable, so stay tuned!