Convert Figma logo to code with AI

rui314 logochibicc

A small C compiler

9,556
869
9,556
101

Top Related Projects

6,127

A Small C Compiler

1,806

A Small C Compiler

Quick Overview

The chibicc project is a simple C compiler written in C. It is designed to be a learning resource for those interested in understanding the inner workings of a compiler. The compiler supports a subset of the C programming language and can generate x86-64 assembly code.

Pros

  • Educational Resource: The project is well-documented and provides a step-by-step guide for understanding the compilation process, making it a valuable learning tool for those interested in compiler design.
  • Simplicity: The codebase is relatively small and easy to understand, making it a good starting point for those new to compiler development.
  • Portability: The compiler can generate code for the x86-64 architecture, which is widely used in modern computers.
  • Active Development: The project is actively maintained, with regular updates and improvements.

Cons

  • Limited Language Support: The compiler only supports a subset of the C programming language, which may limit its usefulness for real-world projects.
  • Performance: As a learning-focused project, the compiler may not be optimized for performance, which could be a concern for larger or more complex programs.
  • Lack of Optimization: The generated code may not be as efficient as that produced by more mature and feature-rich compilers.
  • Lack of Tooling: The project does not provide a comprehensive set of tools, such as a debugger or a build system, which could make it more difficult to integrate into a larger development workflow.

Code Examples

Here are a few examples of how to use the chibicc compiler:

  1. Compiling a Simple C Program:
#include <stdio.h>

int main() {
    printf("Hello, World!\n");
    return 0;
}

To compile this program using chibicc, you would run the following command:

$ ./chibicc example.c -o example
$ ./example
Hello, World!
  1. Declaring and Using Variables:
int main() {
    int x = 42;
    int y = 24;
    int z = x + y;
    printf("x = %d, y = %d, z = %d\n", x, y, z);
    return 0;
}

This code declares three integer variables, performs a simple arithmetic operation, and then prints the values of the variables.

  1. Conditional Statements:
int main() {
    int x = 10;
    if (x > 0) {
        printf("x is positive\n");
    } else {
        printf("x is non-positive\n");
    }
    return 0;
}

This code demonstrates the use of an if-else statement to conditionally print a message based on the value of the x variable.

Getting Started

To get started with chibicc, follow these steps:

  1. Clone the repository:
git clone https://github.com/rui314/chibicc.git
  1. Change to the project directory:
cd chibicc
  1. Build the compiler:
make
  1. Compile a C program:
./chibicc example.c -o example
  1. Run the compiled program:
./example

For more detailed instructions and information on the project, please refer to the README.md file in the repository.

Competitor Comparisons

6,127

A Small C Compiler

Pros of 8cc

  • 8cc is a more mature and feature-rich C compiler, supporting a wider range of C language features and optimizations.
  • The codebase of 8cc is more extensive and well-documented, making it easier for contributors to understand and extend.
  • 8cc has a more robust test suite, ensuring better code quality and reliability.

Cons of 8cc

  • The learning curve for 8cc may be steeper, as it has a more complex codebase and feature set compared to chibicc.
  • 8cc may have a larger memory footprint and slower compilation times, especially for larger projects.

Code Comparison

Here's a brief comparison of the code for the main function in both projects:

chibicc:

int main(int argc, char **argv) {
  if (argc != 2) {
    error("Usage: %s <input-file>", argv[0]);
    return 1;
  }

  user_input = argv[1];
  token = tokenize();
  program();
  return 0;
}

8cc:

int main(int argc, char **argv) {
  if (argc != 2) {
    fprintf(stderr, "usage: %s <file>\n", argv[0]);
    return 1;
  }

  char *filename = argv[1];
  FILE *fp = fopen(filename, "r");
  if (!fp) {
    fprintf(stderr, "cannot open %s\n", filename);
    return 1;
  }

  parse(fp);
  fclose(fp);
  return 0;
}

The main differences are that 8cc has more error handling and file handling logic, while chibicc has a more concise implementation.

1,806

A Small C Compiler

Pros of 9cc

  • Simpler and more straightforward implementation
  • Easier to understand for beginners learning compiler construction
  • Focuses on core concepts without advanced optimizations

Cons of 9cc

  • Limited feature set compared to chibicc
  • Less comprehensive C language support
  • Not as actively maintained or updated

Code Comparison

9cc:

int main(int argc, char **argv) {
  if (argc != 2) {
    fprintf(stderr, "Usage: 9cc <code>\n");
    return 1;
  }

  tokenize(argv[1]);
  Node *node = expr();

  printf(".intel_syntax noprefix\n");
  printf(".global main\n");
  printf("main:\n");

  gen(node);

  printf("  pop rax\n");
  printf("  ret\n");
  return 0;
}

chibicc:

int main(int argc, char **argv) {
  if (argc != 2)
    error("%s: invalid number of arguments", argv[0]);

  Token *tok = tokenize(argv[1]);
  Obj *prog = parse(tok);

  codegen(prog);
  return 0;
}

The chibicc implementation is more modular and structured, with separate functions for parsing and code generation. 9cc combines these steps in the main function, making it more compact but potentially less flexible for larger projects.

Convert Figma logo designs to code with AI

Visual Copilot

Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.

Try Visual Copilot

README

chibicc: A Small C Compiler

(The old master has moved to historical/old branch. This is a new one uploaded in September 2020.)

chibicc is yet another small C compiler that implements most C11 features. Even though it still probably falls into the "toy compilers" category just like other small compilers do, chibicc can compile several real-world programs, including Git, SQLite, libpng and chibicc itself, without making modifications to the compiled programs. Generated executables of these programs pass their corresponding test suites. So, chibicc actually supports a wide variety of C11 features and is able to compile hundreds of thousands of lines of real-world C code correctly.

chibicc is developed as the reference implementation for a book I'm currently writing about the C compiler and the low-level programming. The book covers the vast topic with an incremental approach; in the first chapter, readers will implement a "compiler" that accepts just a single number as a "language", which will then gain one feature at a time in each section of the book until the language that the compiler accepts matches what the C11 spec specifies. I took this incremental approach from the paper by Abdulaziz Ghuloum.

Each commit of this project corresponds to a section of the book. For this purpose, not only the final state of the project but each commit was carefully written with readability in mind. Readers should be able to learn how a C language feature can be implemented just by reading one or a few commits of this project. For example, this is how while, [], ?:, and thread-local variable are implemented. If you have plenty of spare time, it might be fun to read it from the first commit.

If you like this project, please consider purchasing a copy of the book when it becomes available! 😀 I publish the source code here to give people early access to it, because I was planing to do that anyway with a permissive open-source license after publishing the book. If I don't charge for the source code, it doesn't make much sense to me to keep it private. I hope to publish the book in 2021. You can sign up here to receive a notification when a free chapter is available online or the book is published.

I pronounce chibicc as chee bee cee cee. "chibi" means "mini" or "small" in Japanese. "cc" stands for C compiler.

Status

chibicc supports almost all mandatory features and most optional features of C11 as well as a few GCC language extensions.

Features that are often missing in a small compiler but supported by chibicc include (but not limited to):

  • Preprocessor
  • float, double and long double (x87 80-bit floating point numbers)
  • Bit-fields
  • alloca()
  • Variable-length arrays
  • Compound literals
  • Thread-local variables
  • Atomic variables
  • Common symbols
  • Designated initializers
  • L, u, U and u8 string literals
  • Functions that take or return structs as values, as specified by the x86-64 SystemV ABI

chibicc does not support complex numbers, K&R-style function prototypes and GCC-style inline assembly. Digraphs and trigraphs are intentionally left out.

chibicc outputs a simple but nice error message when it finds an error in source code.

There's no optimization pass. chibicc emits terrible code which is probably twice or more slower than GCC's output. I have a plan to add an optimization pass once the frontend is done.

I'm using Ubuntu 20.04 for x86-64 as a development platform. I made a few small changes so that chibicc works on Ubuntu 18.04, Fedora 32 and Gentoo 2.6, but portability is not my goal at this moment. It may or may not work on systems other than Ubuntu 20.04.

Internals

chibicc consists of the following stages:

  • Tokenize: A tokenizer takes a string as an input, breaks it into a list of tokens and returns them.

  • Preprocess: A preprocessor takes as an input a list of tokens and output a new list of macro-expanded tokens. It interprets preprocessor directives while expanding macros.

  • Parse: A recursive descendent parser constructs abstract syntax trees from the output of the preprocessor. It also adds a type to each AST node.

  • Codegen: A code generator emits an assembly text for given AST nodes.

Contributing

When I find a bug in this compiler, I go back to the original commit that introduced the bug and rewrite the commit history as if there were no such bug from the beginning. This is an unusual way of fixing bugs, but as a part of a book, it is important to keep every commit bug-free.

Thus, I do not take pull requests in this repo. You can send me a pull request if you find a bug, but it is very likely that I will read your patch and then apply that to my previous commits by rewriting history. I'll credit your name somewhere, but your changes will be rewritten by me before submitted to this repository.

Also, please assume that I will occasionally force-push my local repository to this public one to rewrite history. If you clone this project and make local commits on top of it, your changes will have to be rebased by hand when I force-push new commits.

Design principles

chibicc's core value is its simplicity and the reability of its source code. To achieve this goal, I was careful not to be too clever when writing code. Let me explain what that means.

Oftentimes, as you get used to the code base, you are tempted to improve the code using more abstractions and clever tricks. But that kind of improvements don't always improve readability for first-time readers and can actually hurts it. I tried to avoid the pitfall as much as possible. I wrote this code not for me but for first-time readers.

If you take a look at the source code, you'll find a couple of dumb-looking pieces of code. These are written intentionally that way (but at some places I might be actually missing something, though). Here is a few notable examples:

  • The recursive descendent parser contains many similar-looking functions for similar-looking generative grammar rules. You might be tempted to improve it to reduce the duplication using higher-order functions or macros, but I thought that that's too complicated. It's better to allow small duplications instead.

  • chibicc doesn't try too hard to save memory. An entire input source file is read to memory first before the tokenizer kicks in, for example.

  • Slow algorithms are fine if we know that n isn't too big. For example, we use a linked list as a set in the preprocessor, so the membership check takes O(n) where n is the size of the set. But that's fine because we know n is usually very small. And even if n can be very big, I stick with a simple slow algorithm until it is proved by benchmarks that that's a bottleneck.

  • Each AST node type uses only a few members of the Node struct members. Other unused Node members are just a waste of memory at runtime. We could save memory using unions, but I decided to simply put everything in the same struct instead. I believe the inefficiency is negligible. Even if it matters, we can always change the code to use unions at any time. I wanted to avoid premature optimization.

  • chibicc always allocates heap memory using calloc, which is a variant of malloc that clears memory with zero. calloc is slightly slower than malloc, but that should be neligible.

  • Last but not least, chibicc allocates memory using calloc but never calls free. Allocated heap memory is not freed until the process exits. I'm sure that this memory management policy (or lack thereof) looks very odd, but it makes sense for short-lived programs such as compilers. DMD, a compiler for the D programming language, uses the same memory management scheme for the same reason, for example [1].

About the Author

I'm Rui Ueyama. I'm the creator of 8cc, which is a hobby C compiler, and also the original creator of the current version of LLVM lld linker, which is a production-quality linker used by various operating systems and large-scale build systems.

References

[1] https://www.drdobbs.com/cpp/increasing-compiler-speed-by-over-75/240158941

DMD does memory allocation in a bit of a sneaky way. Since compilers are short-lived programs, and speed is of the essence, DMD just mallocs away, and never frees.