write-a-C-interpreter
Write a simple interpreter of C. Inspired by c4 and largely based on it.
Quick Overview
Write-a-C-interpreter is an educational project that guides users through the process of building a simple C interpreter from scratch. It aims to help developers understand the inner workings of programming languages and compilers by implementing a basic C interpreter in C.
Pros
- Provides a hands-on learning experience for understanding language interpretation and compilation
- Offers step-by-step explanations and implementation details
- Helps developers gain insights into low-level programming concepts
- Serves as a foundation for further exploration of compiler design and language implementation
Cons
- Limited to a subset of the C language, not a full-featured interpreter
- May not be suitable for production use or real-world applications
- Requires a solid understanding of C programming and computer science concepts
- Documentation could be more comprehensive for beginners
Code Examples
Here are a few examples of code snippets from the project:
- Tokenization:
void next() {
char *last_pos;
int hash;
while (token = *src) {
++src;
if (token == '\n') {
++line;
}
else if (token == '#') {
// skip macro, because we will not support it
while (*src != 0 && *src != '\n') {
src++;
}
}
else if ((token >= 'a' && token <= 'z') || (token >= 'A' && token <= 'Z') || (token == '_')) {
// parse identifier
last_pos = src - 1;
hash = token;
while ((*src >= 'a' && *src <= 'z') || (*src >= 'A' && *src <= 'Z') || (*src >= '0' && *src <= '9') || (*src == '_')) {
hash = hash * 147 + *src;
src++;
}
// look for existing identifier, linear search
current_id = symbols;
while (current_id[Token]) {
if (current_id[Hash] == hash && !memcmp((char *)current_id[Name], last_pos, src - last_pos)) {
//found one, return
token = current_id[Token];
return;
}
current_id = current_id + IdSize;
}
// store new ID
current_id[Name] = (int)last_pos;
current_id[Hash] = hash;
token = current_id[Token] = Id;
return;
}
else if (token >= '0' && token <= '9') {
// parse number, three kinds: dec(123) hex(0x123) oct(017)
token_val = token - '0';
if (token_val > 0) {
// dec, starts with [1-9]
while (*src >= '0' && *src <= '9') {
token_val = token_val*10 + *src++ - '0';
}
} else {
// starts with 0
if (*src == 'x' || *src == 'X') {
//hex
token = *++src;
while ((token >= '0' && token <= '9') || (token >= 'a' && token <= 'f') || (token >= 'A' && token <= 'F')) {
token_val = token_val * 16 + (token & 15) + (token >= 'A' ? 9 : 0);
token = *++src;
}
} else {
// oct
while (*src >= '0' && *src <= '7') {
token_val = token_val*8 + *src++ - '0';
}
}
}
token = Num;
return;
}
// ... (more token parsing)
}
}
This code snippet demonstrates the tokenization process, parsing identifiers and numbers from the input source code.
- Expression parsing:
void expression(int level) {
// parse expressions
int *i
Competitor Comparisons
C in four functions
Pros of c4
- Extremely concise codebase (less than 1000 lines)
- Self-hosting capability
- Minimalistic approach, easier to understand the core concepts
Cons of c4
- Limited feature set compared to write-a-C-interpreter
- Less detailed documentation and explanations
- Fewer comments in the code, potentially harder for beginners to follow
Code Comparison
write-a-C-interpreter:
void expression(int level) {
// ... (omitted for brevity)
switch (*token) {
case '!':
match('!');
*++text = (expr_type == CHAR) ? SC : SI;
expression(Inc);
*++text = PSH;
*++text = IMM;
*++text = 0;
*++text = EQ;
expr_type = INT;
break;
// ... (other cases)
}
}
c4:
void expr(int lev) {
int t, *d;
switch (tk) {
case '!': next(); expr(Inc); *++e = PSH; *++e = IMM; *++e = 0; *++e = EQ; ty = INT; break;
// ... (other cases)
}
}
The c4 implementation is more compact, using shorter variable names and condensing multiple lines into one. write-a-C-interpreter provides more verbose and readable code with additional comments and explanations.
A Compiler Writing Journey
Pros of acwj
- More comprehensive coverage of compiler construction topics, including lexing, parsing, and code generation
- Incremental approach with 65 steps, allowing for a gradual learning experience
- Includes assembly code generation for a real architecture (x86-64)
Cons of acwj
- Larger project scope may be overwhelming for beginners
- Focuses on a subset of C, not a full C interpreter
- Less emphasis on the interpretation aspect compared to write-a-C-interpreter
Code Comparison
write-a-C-interpreter:
void next() {
char *last_pos;
int hash;
while (token = *src) {
++src;
acwj:
static int next(void) {
int c;
if (Putback) { // Use the character put
c = Putback; // back if there is one
Putback = 0;
return c;
}
The code snippets show different approaches to tokenization. write-a-C-interpreter uses a single function to handle all token types, while acwj separates concerns with multiple functions for different token types.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
C interpreter that interprets itself.
How to Run the Code
File xc.c
is the original one and xc-tutor.c
is the one that I make for
the tutorial step by step.
gcc -o xc xc.c
./xc hello.c
./xc -s hello.c
./xc xc.c hello.c
./xc xc.c xc.c hello.c
About
This project is inspired by c4 and is largely based on it.
However, I rewrote them all to make it more understandable and help myself to understand it.
Despite the complexity we saw in books about compiler design, writing one is not that hard. You don't need that much theory though they will help for better understanding the logic behind the code.
Also I write a series of article about how this compiler is built under directory tutorial/en
.
There is also a chinese version in my blog.
- ææææä½ æ建 C è¯è¨ç¼è¯å¨ï¼0ï¼ââåè¨
- ææææä½ æ建 C è¯è¨ç¼è¯å¨ï¼1ï¼ââ设计
- ææææä½ æ建 C è¯è¨ç¼è¯å¨ï¼2ï¼ââèææº
- ææææä½ æ建 C è¯è¨ç¼è¯å¨ï¼3ï¼ââè¯æ³åæå¨
- ææææä½ æ建 C è¯è¨ç¼è¯å¨ï¼4ï¼ââéå½ä¸é
- ææææä½ æ建 C è¯è¨ç¼è¯å¨ï¼5ï¼ââåéå®ä¹
- ææææä½ æ建 C è¯è¨ç¼è¯å¨ï¼6ï¼ââå½æ°å®ä¹
- ææææä½ æ建 C è¯è¨ç¼è¯å¨ï¼7ï¼ââè¯å¥
- ææææä½ æ建 C è¯è¨ç¼è¯å¨ï¼8ï¼ââ表达å¼
- ææææä½ æ建 C è¯è¨ç¼è¯å¨ï¼9ï¼ââæ»ç»
Resources
Further Reading:
- Let's Build a Compiler: An excellent starting material for building compiler.
Forks:
Licence
The original code is licenced with GPL2, so this code will use the same licence.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot