DRAFT: Making a simple interpreter from scratch, part 0: Introduction
In this series, I will go through writing an interpreter for a programming language from scratch, using nothing but C and the standard library. I'm writing it because I think writing interpreters is super fun, and I think a lot of people find the topic kind of foreign. I hope this series will demystify interpreters and parsers for someone.
Plus, I needed an excuse to write more for my blog, and I wanted an excuse to write an interpreter :)
There will be a lot of code in this series. I don't expect you to read all of it. There's a lot of comments, so if you just skim the code and read the comments, you should get the gist of what's going on.
Idea
I like to have an idea of what I want to achieve when writing a language. While a lot of the machinery early on will be sort of general, I like having an overarching plan. I love working towards a goal and feeling like I'm actually making progress. So let's design our own small programming language!
One of the first steps is to figure out a name for the language. I decided on the name felt, because, well, it's what I came up with.
I want a few basic data types:
- Booleans,
true
andfalse
- Numbers,
1.523
- Strings,
"Hello World"
- Functions,
{ 10 + 3 }
- Arrays of other variables,
[ "hello"; 10; true ]
- Objects, i.e string -> value maps,
{ foo: 5; bar: "no" }
I want a few basic kinds of expressions:
- Literal expressions,
5
or"Hello"
- Variable expressions,
foo
- Assignment expressions,
name = "Albert"
- Function call expressions,
print "Hello, my number is:" 5.3
- Function call expressions without arguments:
print!
- Group expressions,
print "Hello, 5 + 3 =" (+ 5 3)
In general, I want this code to print 50 + 33 = 83
:
calculator = {
calculate: {
sum = + 50 33
print "50 + 33 =" sum
}
}
calculator.calculate!
I drew a syntax diagram (wiki) to sort of kind of formalize the syntax. You don't need to read through it, but it can be useful as a reference. You can find it here.
Getting Started
The structure of this project will be simple: a directory with a Makefile
and
a src
directory. Since the project is small, a simple hand-written Makefile
should suffice:
Makefile:
SRCS := $(wildcard src/*.c)
HDRS := $(wildcard src/*.h)
OBJS := $(patsubst src/%.c,obj/%.o,$(SRCS))
felt: $(OBJS)
$(CC) $(LDFLAGS) -o $@ $^
obj/%.o: src/%.c $(HDRS)
@mkdir -p $(@D)
$(CC) -Wall -Wextra -Wno-unused-parameter -Wpedantic $(CFLAGS) -o $@ -c $<
.PHONY: clean
clean:
rm -rf obj
rm -f felt
src/main.c:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char **argv) {
printf("Hello World!\n");
return EXIT_SUCCESS;
}
vidarr ~/dev/felt-lang $ make
cc -o obj/main.o -c src/main.c
cc -o felt obj/main.o
vidarr ~/dev/felt-lang $ ./felt
Hello World!
Utility code
Although we won't start writing the actual parser yet, we can start writing some general utility code we'll need.
I know for sure we'll be doing a lot of stuff where we have a buffer which needs to dynamically grow, so we can start by writing a simple dynamic buffer abstraction:
src/buffer.h:
struct felt_buffer {
unsigned char *buffer;
size_t size;
size_t len;
};
// Initialize a buffer, but don't alloc yet.
void felt_buffer_init(struct felt_buffer *buf);
// Free any memory allocated by the buffer.
void felt_buffer_free(struct felt_buffer *buf);
// Release the buffer, so that it's not managed by the felt_buffer anymore.
// The flelt_buffer will be reset and ready to be re-used.
unsigned char *felt_buffer_release(struct felt_buffer *buf);
// Make sure at least 'size' bytes are available.
// Returns -1 and sets errno on error.
int felt_buffer_alloc(struct felt_buffer *buf, size_t size);
// Copy 'size' bytes from 'src' to the end of the buffer.
// Returns -1 and sets errno on error.
int felt_buffer_push(struct felt_buffer *buf, void *src, size_t size);
// Copy 'size' bytes from the end of the buffer to 'dest'.
// Returns -1 and leaves 'dest' untouched if buf->len < size.
int felt_buffer_pop(struct felt_buffer *buf, void *dest, size_t size);
// Shrink the buffer to fit its contents.
// Returns -1 and sets errno on error.
int felt_buffer_shrink(struct felt_buffer *buf);
Those functions are implemented in src/buffer.c.
Part 1: How do we make sense of words?
The source for this part is available on gitlab.