Top Related Projects
C in four functions
Write a simple interpreter of C. Inspired by c4 and largely based on it.
The lcc retargetable ANSI C compiler
Quick Overview
The DoctorWkt/acwj repository is a comprehensive tutorial on building a compiler from scratch. It guides readers through the process of creating a compiler for a subset of C, progressing from a simple calculator to a fully functional compiler. The project is divided into multiple parts, each focusing on different aspects of compiler construction.
Pros
- Detailed step-by-step explanation of compiler construction
- Practical approach with working code examples
- Covers a wide range of compiler topics, from lexical analysis to code generation
- Suitable for both beginners and intermediate programmers interested in compilers
Cons
- Focuses on a subset of C, not a full C compiler
- May not cover some advanced optimization techniques
- Requires a significant time investment to work through all parts
- Some parts may be challenging for absolute beginners in programming
Getting Started
To get started with the acwj project:
-
Clone the repository:
git clone https://github.com/DoctorWkt/acwj.git
-
Navigate to the desired part:
cd acwj/part01
-
Read the README.md file in each part for instructions on building and running the code.
-
Follow the tutorial in order, starting from part01 and progressing through the subsequent parts.
-
Experiment with the code and try to implement the suggested exercises at the end of each part.
Note: This project is primarily an educational resource and not a code library, so there are no specific code examples or quick start instructions for using it as a library.
Competitor Comparisons
C in four functions
Pros of c4
- Extremely compact and concise implementation (less than 500 lines of code)
- Self-contained in a single C file, making it easy to understand and modify
- Demonstrates core compiler concepts with minimal complexity
Cons of c4
- Limited language support compared to acwj's more comprehensive approach
- Lacks detailed explanations and documentation found in acwj
- May be too simplified for learning advanced compiler techniques
Code Comparison
c4:
int *id_name, id_type;
int *id = id_name = malloc(sizeof(int) * (ID_SIZE + 1));
while (tk = *p) {
p++; Putchar(tk);
if (tk == 'a' || tk == 'b' || tk == 'c' || tk == 'd' || tk == 'e' || tk == 'f') *id++ = tk;
else if (tk >= '0' && tk <= '9') { ty = INT; *id++ = tk; }
}
acwj:
static struct ASTnode *primary(void) {
struct ASTnode *n;
int id;
switch (Token.token) {
case T_INTLIT:
n = mkastleaf(A_INTLIT, Token.intvalue);
scan(&Token);
return (n);
case T_IDENT:
id = findglob(Text);
if (id == -1)
fatals("Unknown variable", Text);
n = mkastleaf(A_IDENT, id);
scan(&Token);
return (n);
default:
fatald("Syntax error, token", Token.token);
}
}
Write a simple interpreter of C. Inspired by c4 and largely based on it.
Pros of write-a-C-interpreter
- Focuses specifically on building a C interpreter, providing a more targeted learning experience
- Includes a step-by-step tutorial in the README, making it easier for beginners to follow along
- Implements a complete C interpreter in a single file, offering a more compact and self-contained project
Cons of write-a-C-interpreter
- Less comprehensive coverage of compiler concepts compared to acwj
- Lacks the detailed explanations and documentation found in acwj's accompanying blog posts
- May not provide as much insight into real-world compiler development practices
Code Comparison
write-a-C-interpreter:
void expression(int level) {
// ... (implementation details)
}
acwj:
struct ASTnode *binexpr(int ptp) {
struct ASTnode *left, *right;
int tokentype;
// ... (implementation details)
}
The code snippets show different approaches to parsing expressions. write-a-C-interpreter uses a single function with a level parameter, while acwj employs a more structured approach with separate AST nodes and token types.
The lcc retargetable ANSI C compiler
Pros of lcc
- More mature and feature-complete C compiler
- Extensively documented with a companion book
- Wider platform support and portability
Cons of lcc
- Less beginner-friendly due to complexity
- Not actively maintained (last commit in 2015)
- Larger codebase, potentially harder to understand fully
Code Comparison
lcc (from lcc.c):
int main(int argc, char *argv[]) {
int i, j, nf;
static int initted;
progname = argv[0];
if (initted++ == 0) {
// ... initialization code ...
}
// ... more code ...
}
acwj (from main.c):
int main(int argc, char *argv[]) {
int i;
// Initialise our variables
init_vars();
// Scan and parse the input file
scan(&Token);
genpreamble();
statements();
genpostamble();
fclose(Outfile);
exit(0);
}
Summary
lcc is a more comprehensive C compiler with broader support, while acwj is a step-by-step tutorial for building a simple compiler. lcc offers a complete solution but may be overwhelming for beginners, whereas acwj provides a gradual learning experience with a focus on educational value.
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual CopilotREADME
A Compiler Writing Journey
In this Github repository, I'm documenting my journey to write a self-compiling compiler for a subset of the C language. I'm also writing out the details so that, if you want to follow along, there will be an explanation of what I did, why, and with some references back to the theory of compilers.
But not too much theory, I want this to be a practical journey.
Here are the steps I've taken so far:
- Part 0: Introduction to the Journey
- Part 1: Introduction to Lexical Scanning
- Part 2: Introduction to Parsing
- Part 3: Operator Precedence
- Part 4: An Actual Compiler
- Part 5: Statements
- Part 6: Variables
- Part 7: Comparison Operators
- Part 8: If Statements
- Part 9: While Loops
- Part 10: For Loops
- Part 11: Functions, part 1
- Part 12: Types, part 1
- Part 13: Functions, part 2
- Part 14: Generating ARM Assembly Code
- Part 15: Pointers, part 1
- Part 16: Declaring Global Variables Properly
- Part 17: Better Type Checking and Pointer Offsets
- Part 18: Lvalues and Rvalues Revisited
- Part 19: Arrays, part 1
- Part 20: Character and String Literals
- Part 21: More Operators
- Part 22: Design Ideas for Local Variables and Function Calls
- Part 23: Local Variables
- Part 24: Function Parameters
- Part 25: Function Calls and Arguments
- Part 26: Function Prototypes
- Part 27: Regression Testing and a Nice Surprise
- Part 28: Adding More Run-time Flags
- Part 29: A Bit of Refactoring
- Part 30: Designing Structs, Unions and Enums
- Part 31: Implementing Structs, Part 1
- Part 32: Accessing Members in a Struct
- Part 33: Implementing Unions and Member Access
- Part 34: Enums and Typedefs
- Part 35: The C Pre-Processor
- Part 36:
break
andcontinue
- Part 37: Switch Statements
- Part 38: Dangling Else and More
- Part 39: Variable Initialisation, part 1
- Part 40: Global Variable Initialisation
- Part 41: Local Variable Initialisation
- Part 42: Type Casting and NULL
- Part 43: Bugfixes and More Operators
- Part 44: Constant Folding
- Part 45: Global Variable Declarations, revisited
- Part 46: Void Function Parameters and Scanning Changes
- Part 47: A Subset of
sizeof
- Part 48: A Subset of
static
- Part 49: The Ternary Operator
- Part 50: Mopping Up, part 1
- Part 51: Arrays, part 2
- Part 52: Pointers, part 2
- Part 53: Mopping Up, part 2
- Part 54: Spilling Registers
- Part 55: Lazy Evaluation
- Part 56: Local Arrays
- Part 57: Mopping Up, part 3
- Part 58: Fixing Pointer Increments/Decrements
- Part 59: Why Doesn't It Work, part 1
- Part 60: Passing the Triple Test
- Part 61: What's Next?
- Part 62: Code Cleanup
- Part 63: A New Backend using QBE
- Part 64: A Backend for the 6809 CPU
There isn't a schedule or timeline for the future parts, so just keep checking back here to see if I've written any more.
Copyrights
I have borrowed some of the code, and lots of ideas, from the SubC compiler written by Nils M Holm. His code is in the public domain. I think that my code is substantially different enough that I can apply a different license to my code.
Unless otherwise noted,
- all source code and scripts are (c) Warren Toomey under the GPL3 license.
- all non-source code documents (e.g. English documents, image files) are (c) Warren Toomey under the Creative Commons BY-NC-SA 4.0 license.
Top Related Projects
C in four functions
Write a simple interpreter of C. Inspired by c4 and largely based on it.
The lcc retargetable ANSI C compiler
Convert designs to code with AI
Introducing Visual Copilot: A new AI model to turn Figma designs to high quality code using your components.
Try Visual Copilot