DRAFT: Making a simple interpreter from scratch, part 0: Introduction

In this series, I will go through writing an interpreter for a programming language from scratch, using nothing but C and the standard library. I'm writing it because I think writing interpreters is super fun, and I think a lot of people find the topic kind of foreign. I hope this series will demystify interpreters and parsers for someone.

Plus, I needed an excuse to write more for my blog, and I wanted an excuse to write an interpreter :)

There will be a lot of code in this series. I don't expect you to read all of it. There's a lot of comments, so if you just skim the code and read the comments, you should get the gist of what's going on.

Idea

I like to have an idea of what I want to achieve when writing a language. While a lot of the machinery early on will be sort of general, I like having an overarching plan. I love working towards a goal and feeling like I'm actually making progress. So let's design our own small programming language!

One of the first steps is to figure out a name for the language. I decided on the name felt, because, well, it's what I came up with.

I want a few basic data types:

  1. Booleans, true and false
  2. Numbers, 1.523
  3. Strings, "Hello World"
  4. Functions, { 10 + 3 }
  5. Arrays of other variables, [ "hello"; 10; true ]
  6. Objects, i.e string -> value maps, { foo: 5; bar: "no" }

I want a few basic kinds of expressions:

  1. Literal expressions, 5 or "Hello"
  2. Variable expressions, foo
  3. Assignment expressions, name = "Albert"
  4. Function call expressions, print "Hello, my number is:" 5.3
  5. Function call expressions without arguments: print!
  6. Group expressions, print "Hello, 5 + 3 =" (+ 5 3)

In general, I want this code to print 50 + 33 = 83:

calculator = {
	calculate: {
		sum = + 50 33
		print "50 + 33 =" sum
	}
}

calculator.calculate!

I drew a syntax diagram (wiki) to sort of kind of formalize the syntax. You don't need to read through it, but it can be useful as a reference. You can find it here.

Getting Started

The structure of this project will be simple: a directory with a Makefile and a src directory. Since the project is small, a simple hand-written Makefile should suffice:

Makefile:

SRCS := $(wildcard src/*.c)
HDRS := $(wildcard src/*.h)
OBJS := $(patsubst src/%.c,obj/%.o,$(SRCS))

felt: $(OBJS)
	$(CC) $(LDFLAGS) -o $@ $^

obj/%.o: src/%.c $(HDRS)
	@mkdir -p $(@D)
	$(CC) -Wall -Wextra -Wno-unused-parameter -Wpedantic $(CFLAGS) -o $@ -c $<

.PHONY: clean
clean:
	rm -rf obj
	rm -f felt

src/main.c:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char **argv) {
	printf("Hello World!\n");
	return EXIT_SUCCESS;
}
vidarr ~/dev/felt-lang $ make
cc  -o obj/main.o -c src/main.c
cc  -o felt obj/main.o
vidarr ~/dev/felt-lang $ ./felt
Hello World!

Utility code

Although we won't start writing the actual parser yet, we can start writing some general utility code we'll need.

I know for sure we'll be doing a lot of stuff where we have a buffer which needs to dynamically grow, so we can start by writing a simple dynamic buffer abstraction:

src/buffer.h:

struct felt_buffer {
	unsigned char *buffer;
	size_t size;
	size_t len;
};

// Initialize a buffer, but don't alloc yet.
void felt_buffer_init(struct felt_buffer *buf);

// Free any memory allocated by the buffer.
void felt_buffer_free(struct felt_buffer *buf);

// Release the buffer, so that it's not managed by the felt_buffer anymore.
// The flelt_buffer will be reset and ready to be re-used.
unsigned char *felt_buffer_release(struct felt_buffer *buf);

// Make sure at least 'size' bytes are available.
// Returns -1 and sets errno on error.
int felt_buffer_alloc(struct felt_buffer *buf, size_t size);

// Copy 'size' bytes from 'src' to the end of the buffer.
// Returns -1 and sets errno on error.
int felt_buffer_push(struct felt_buffer *buf, void *src, size_t size);

// Copy 'size' bytes from the end of the buffer to 'dest'.
// Returns -1 and leaves 'dest' untouched if buf->len < size.
int felt_buffer_pop(struct felt_buffer *buf, void *dest, size_t size);

// Shrink the buffer to fit its contents.
// Returns -1 and sets errno on error.
int felt_buffer_shrink(struct felt_buffer *buf);

Those functions are implemented in src/buffer.c.


Part 1: How do we make sense of words?

The source for this part is available on gitlab.

Read More