C/C++: 70x faster file embeds using string literals

Date: 2020-08-03
Git: https://gitlab.com/mort96/blog/blob/published/content/00000-home/00013-fast-cpp-embeds.md
Tool: https://github.com/mortie/strliteral

It's really common to want to embed some static data into a binary. Game developers want to embed their shaders. Developers of graphical apps may want to embed sounds or icons. Developers of programming language interpreters may want to embed their language's standard library. I have many times built software whose GUI is in the form of a web app served from a built-in HTTP server, where I want to embed the HTML/JS/CSS into the binary.

Since neither C nor C++ currently has a built-in way to embed files, we use work-arounds. These usually fall into one of two categories: Either we use toolchain-specific features to generate object files with the data exposed as symbols, or we generate C code which we subsequently compile to object files. Since the toolchain-specific features are, well, toolchain-specific, people writing cross-platform software generally prefer code generation.

The most common tool I'm aware of to generate C code for embedding data is xxd, whose -i option will generate C code with an unsigned char array literal.

Given the following input text:

<html>
	<head>
		<title>Hello World</title>
	</head>
	<body>
		Hello World
	</body>
</html>
index.html

The command xxd -i index.html will produce this C code:

unsigned char index_html[] = {
  0x3c, 0x68, 0x74, 0x6d, 0x6c, 0x3e, 0x0a, 0x09, 0x3c, 0x68, 0x65, 0x61,
  0x64, 0x3e, 0x0a, 0x09, 0x09, 0x3c, 0x74, 0x69, 0x74, 0x6c, 0x65, 0x3e,
  0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x57, 0x6f, 0x72, 0x6c, 0x64, 0x3c,
  0x2f, 0x74, 0x69, 0x74, 0x6c, 0x65, 0x3e, 0x0a, 0x09, 0x3c, 0x2f, 0x68,
  0x65, 0x61, 0x64, 0x3e, 0x0a, 0x09, 0x3c, 0x62, 0x6f, 0x64, 0x79, 0x3e,
  0x0a, 0x09, 0x09, 0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x20, 0x57, 0x6f, 0x72,
  0x6c, 0x64, 0x0a, 0x09, 0x3c, 0x2f, 0x62, 0x6f, 0x64, 0x79, 0x3e, 0x0a,
  0x3c, 0x2f, 0x68, 0x74, 0x6d, 0x6c, 0x3e, 0x0a
};
unsigned int index_html_len = 92;

This works fairly well. Any C or C++ compiler can compile that code and produce an object file with our static data, which we can link against to embed that data into our binary. All in a cross-platform and cross-toolchain way.

There's just one problem: It's slow. Really slow. On my laptop, embedding a megabyte this way takes 2 seconds using g++. Embedding one decent quality MP3 at 8.4MB takes 23 seconds, using 2.5 gigabytes of RAM.

bippety-boppety.mp3, an 8.4MB song

Whether or not we should embed files of that size into our binaries is a question I won't cover in this article, and the answer depends a lot on context. Regardless, processing data at just over 400kB per second is objectively terrible. We can do so much better.

The main reason it's so slow is that parsing arbitrary C++ expressions is actually really complicated. Every single byte is a separate expression, parsed using a complex general expression parser, presumably separately allocated as its own node in the syntax tree. If only we could generate code which combines lots of bytes of data into one token...

I wrote a small tool, called strliteral, which outputs data as a string literal rather than a character array. The command strliteral index.html will produce this C code:

const unsigned char index_html[] =
	"<html>\n\t<head>\n\t\t<title>Hello World</title>\n\t</head>\n\t<body>\n\t\tHello"
	" World\n\t</body>\n</html>\n";
const unsigned int index_html_len = 92;

It should come as no surprise that this is many times faster to parse than the character array approach. Instead of invoking a full expression parser for each and every byte, most of the time will just be spent in a tight loop which reads bytes and appends them to an array. The grammar for a string literal is ridiculously simple compared to the grammar for an array literal.

Compared to xxd's 23 seconds and 2.5GB of RAM usage for my 8.4MB file, my strliteral tool produces code which g++ can compile in 0.6 seconds, using only 138 megs of RAM. That's almost a 40x speed-up, and an 18x reduction in RAM usage. It's processing data at a rate of 15MB per second, compared to xxd's 0.4MB per second. As a bonus, my tool generates 26MB of C code, compared to xxd's 52MB.

Here's how that song looks, encoded with strliteral:

const unsigned char bippety_boppety_mp3[] =
	"\377\373\340D\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
	"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
	"\242\240,\2253]5\234\316\020\234\375\246\072D\307\027\203R\030\307\221\314`\243B\370\013\301\220\256"
	"\235\036\243E\246\331\216\026\004\341\362uU&\255\030@,\227\021q]1\231L\304\010E\311\231\005W\231\210"
	"j-\"\374|\210II0\221\026\045\021}qC\206\t9<\320\013\246w\350\263EmH`#\262\037\252\304\272\340\355`7\217"
	"\343*\016\236\320\345oa\217\204\361~k\224\255|\301cy\371\375\034\366K'\236\037\271\204\371\275\rV\267"
	"\252\020\245\322~\233\350\222\343\347\204\332\340~\236-\355S.W\045\365\301=\\+\236\270F\312\246g\266"
	"CX2\376\265V\242T0\337I\031\343\347\320\336\322\020\016\020H\250\007]\031\201\235\025\300h\2628d\000"
	/* 249707 lines snipped */
	"\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252"
	"\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252\252"
	"\252TAG\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
	"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
	"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
	"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000Created with LMMS\000"
	"\000\000\000\000\000\000\000\000\000\000\000\000\377";
unsigned int bippety_boppety_mp3_len = 8779359;

The difference is even bigger when processing mostly-ASCII text rather than binary data. Since xxd produces the same 6 bytes of source code for every byte of input (0x, two hex digits, comma, space), the data itself doesn't matter. However, strliteral produces 4 bytes of source code (\, then three octal digits) for every "weird" character, but just one byte of source code for every "regular" ASCII character.

Graphs

I wrote some benchmarking code to compare various aspects of xxd and strliteral. All times are measured using an Intel Core i7-8705G CPU in a Dell XPS 15 9575. g++ and xxd are from the Ubuntu 20.04 repositories. strliteral is compiled with gcc -O3 -o strliteral strliteral.c using GCC 9.3.0. The benchmarking source code can be found here: https://github.com/mortie/strliteral/tree/master/benchmark

Here's a graph which shows exactly how the two tools compare, across a range of input sizes, given either text or random binary data:

The 70x number in the title comes from this graph. The 60ms spent compiling strliteral-generated code is 72x faster than the 4324ms spent compiling xxd-generated code. Comparing random binary data instead of text would show a lower - though still respectable - speed-up of 25x.

Though most of the time spent when embedding data with xxd comes from the compiler, the xxd tool itself is actually fairly slow too:

Those ~200 milliseconds xxd takes to generate code for a 2MB file isn't very significant compared to the 4.3 second compile time, but if strliteral was equally slow, 75% of the time would've been spent generating code as opposed to compiling code. Luckily, strliteral runs through 2MB of text in 11ms.

Looking at the xxd source code, the reason it's so slow seems to be that it prints every single byte using a call to fprintf:

while ((length < 0 || p < length) && (c = getc(fp)) != EOF)
  {
    if (fprintf(fpo, (hexx == hexxa) ? "%s0x%02x" : "%s0X%02X",
                (p % cols) ? ", " : &",\n  "[2*!p],  c) < 0)
      die(3);
    p++;
  }

Finally, here's a graph over g++'s memory usage:

Caveats

Update: In the reddit discussion, someone pointed out that MSVC, Microsoft's compiler, has a fairly low maximum string length limit (the exact limit is fairly complicated). I had assumed that any modern compiler would just keep strings in a variable sized array. Maybe strliteral will eventually grow an MSVC-specific workaround, but until then, using a better compiler like Clang or GCC on Windows is an option.

Using string literals for arbitrary binary data is a bit more complicated than using an array with integer literals. Both xxd and strliteral might have trouble in certain edge cases, such as when cross-compiling if the host and target disagrees on the number of bits in a byte. Using string literals adds an extra complication due to the distinction between the "source character set" and the "execution character set". The C11 spec (5.2.1p2) states:

In a character constant or string literal, members of the execution character set shall be represented by corresponding members of the source character set or by escape sequences consisting of the backslash \ followed by one or more characters.

If you run strliteral on a file which contains the byte 97, it will output the code const unsigned char data[] = "a";. If that C code is compiled with a "source character set" of ASCII and an "execution character set" of EBCDIC, my understanding of the standard text is that the ASCII "a" (byte 97) will be translated to the EBCDIC "a" (byte 129). Whether that's even a bug or not depends on whether the intention is to embed binary data or textual data, but it's probably not what people expect from a tool to embed files.

This should only ever become an issue if you're compiling with different source and execution charsets, where the source charset and execution charset aren't based on ASCII. Compiling with a UTF-8 source charset and an EBCDIC execution charset will cause issues, but since all non-ASCII characters are printed as octal escape sequences, compiling with e.g a UTF-8 source charset and a LATIN-1 execution charset isn't an issue.

It seems extremely unlikely to me that someone will compile with a source charset and an execution charset which are both different and not based on ASCII, but I suppose it's something to keep in mind. If it does become an issue, the --always-escape option will cause strliteral to only generate octal escape sequences. That should work the same as xxd -i in all cases, just faster.

Some implementation notes

C is a weird language. For some reason, probably to better support systems where bytes are bigger than 8 bits, hex string escapes like "\x6c" can be an arbitrary number of characters. "\xfffff" represents a string with one character whose numeric value is 1048575. That obviously won't work on machines with 8-bit bytes, but it could conceivably be useful on a machine with 24-bit bytes, so it's allowed. Luckily, octal escapes are at most 3 numbers, so while "\xf8ash" won't work, "\370ash" will.

C also has a concept of trigraphs and digraphs, and they're expanded even within string literals. The string literals "??(" and "[" are identical (at least in C, and in C++ before C++17). Currently, strliteral just treats ?, : and % as "special" characters which are escaped, which means no digraphs or trigraphs will ever appear in the generated source code. I decided it's not worth the effort to add more "clever" logic which e.g escapes a ( if the two preceeding characters are question marks.

Read More

Hacking on Clang is surprisingly easy

Date: 2020-01-27
Git: https://gitlab.com/mort96/blog/blob/published/content/00000-home/00012-clang-compiler-hacking.md

I happen to think that the current lambda syntax for C++ is kind of verbose. I'm not the only one to have thought that, and there has already been a paper discussing a possible abbreviated lambda syntax (though it was rejected).

In this blog post, I will detail my attempt to implement a sort of simplest possible version of an abbreviated lambda syntax. Basically, this:

[](auto &&a, auto &&b) => a.id() < b.id();

should mean precisely:

[](auto &&a, auto &&b) { return a.id() < b.id(); };

I will leave a discussion about whether that change is worth it or not to the end. Most of this article will just assume that we want that new syntax, and discuss how to actually implement it in Clang.

If you want to read more discussion on the topic, I wrote a somewhat controversial post on Reddit discussing why I think it might be a good idea.

Here's the implementation I will go through in this post: https://github.com/mortie/llvm/commit/e4726dc9d9d966978714fc3d85c6e9c335a38ab8 - 28 additions, including comments and whitespace, across 3 files.

Getting the Clang code

This wasn't my first time compiling Clang, but it was my first time downloading the source code with the intent to change it.

LLVM has a nice page which details getting and building Clang, but the tl;dr is:

git clone https://github.com/llvm/llvm-project.git
cd llvm-project && mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=`pwd`/inst -DLLVM_ENABLE_PROJECTS=clang -DCMAKE_BUILD_TYPE=Release ../llvm
make -j 8
make install

A few points to note:

  • The build will take a long time. Clang is big.
  • I prefer -DCMAKE_BUILD_TYPE=Release because it's way faster to build. Linking Clang with debug symbols and everything takes ages and will OOM your machine.
  • This will install your built clang to inst (short for "install"). The clang binary itself will be in inst/bin/clang.

Now that we have a clang setup, we can have a look at how the project is laid out, and play with it.

Changing the Clang code

The feature I want to add is very simple: Basically, I want [] => 10 to mean the exact same thing as [] { return 10; }. In order to understand how one would achieve that, an extremely short introduction to how compilers work is necessary:

Our code is just a sequence of bytes, like [] => 10 + 20. In order for Clang to make sense of that, it will go through many steps. We can basically divide a compiler into two parts: the "front-end", which goes through many steps to build a thorough understanding of the code as a graph structure, and the "back-end" which goes through many steps to remove information, eventually ending up with a simple series of bytes again, but this time as machine code instead of ASCII.

We'll ignore the back-end for now. The front-end basically works like this:

  1. Split the stream of bytes into a stream of tokens. This step turns [] => 10 + 20 into something like (open-brace) (close-brace) (fat-arrow) (number: 10) (plus) (number: 20).
  2. Go through those tokens and construct a tree. This step turns the sequence of tokens into a tree: (lambda-expression (body (return-statement (add-expression (number 10) (number 20))))) (Yeah, this looks a lot like Lisp. There's a reason people say Lisp basically has no syntax; you're just writing out the syntax tree by hand.)
  3. Add semantic information, such as types.

The first phase is usually called lexical analysis, or tokenization, or scanning. The second phase is what we call parsing. The third phase is usually called semantic analysis or type checking.

Well, the change I want to make involves adding a new token, the "fat arrow" token =>. That means we'll have to find out how the lexer (or tokenizer) is implemented; where it keeps its list of valid tokens types, and where it turns the input text into tokens. After some grepping, I found the file clang/include/clang/Basic/TokenKinds.def, which includes a bunch of token descriptions, such as PUNCTUATOR(arrow, "->"). This file seems to be a "supermacro"; a file which exists to be included by another file as a form of macro expansion.

I added PUNCTUATOR(fatarrow, "=>") right below the PUNCTUATOR(arow, "->") line.

Now that we have defined our token, we need to get the lexer to actually generate it.

After some more grepping, I found clang/lib/Lex/Lexer.cpp, where the Lexer::LexTokenInternal function is what's actually looking at characters and deciding what tokens they represent. It has a case statement to deal with tokens which start with an = character:

case '=':
	Char = getCharAndSize(CurPtr, SizeTmp);
	if (Char == '=') {
		// If this is '====' and we're in a conflict marker, ignore it.
		if (CurPtr[1] == '=' && HandleEndOfConflictMarker(CurPtr-1))
			goto LexNextToken;

		Kind = tok::equalequal;
		CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);
	} else {
		Kind = tok::equal;
	}
	break;

Given that, the change to support my fatarrow token is really simple:

case '=':
	Char = getCharAndSize(CurPtr, SizeTmp);
	if (Char == '=') {
		// If this is '====' and we're in a conflict marker, ignore it.
		if (CurPtr[1] == '=' && HandleEndOfConflictMarker(CurPtr-1))
		goto LexNextToken;

		Kind = tok::equalequal;
		CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);

	// If the first character is a '=', and it's followed by a '>', it's a fat arrow
	} else if (Char == '>') {
		Kind = tok::fatarrow;
		CurPtr = ConsumeChar(CurPtr, SizeTmp, Result);

	} else {
		Kind = tok::equal;
	}
	break;

Now that we have a lexer which generates a tok::fatarrow any time it encounters a => in our code, we can start changing the parser to make use of it.

Since I want to change lambda parsing, the code which parses a lamba seems like a good place to start (duh). I found that in a file called clang/lib/Parse/ParseExprCXX.cpp, in the function ParseLambdaExpressionAfterIntroducer. Most of the function deals with things like the template parameter list and trailing return type, which I don't want to change, but the very end of the function contains this gem:

// Parse compound-statement.
if (!Tok.is(tok::l_brace)) {
	Diag(Tok, diag::err_expected_lambda_body);
	Actions.ActOnLambdaError(LambdaBeginLoc, getCurScope());
	return ExprError();
}

StmtResult Stmt(ParseCompoundStatementBody());
BodyScope.Exit();
TemplateParamScope.Exit();

if (!Stmt.isInvalid() && !TrailingReturnType.isInvalid())
	return Actions.ActOnLambdaExpr(LambdaBeginLoc, Stmt.get(), getCurScope());

Actions.ActOnLambdaError(LambdaBeginLoc, getCurScope());
return ExprError();
  1. If the next token isn't an opening brace, error.
  2. Parse a compound statement body (i.e consume a {, read statements until the }).
  3. After some housekeeping, act on the now fully parsed lambda expression.

In principle, what we want to do is to check if the next token is a => instead of a {; if it is, we want to parse an expression instead of a compound statement, and then somehow pretend that the expression is a return statement. Through some trial, error and careful copy/pasting, I came up with this block of code which I put right before the if (!Tok.is(tok::l_brace)):

// If this is an arrow lambda, we just need to parse an expression.
// We parse the expression, then put that expression in a return statement,
// and use that return statement as our body.
if (Tok.is(tok::fatarrow)) {
	SourceLocation ReturnLoc(ConsumeToken());

	ExprResult Expr(ParseExpression());
	if (Expr.isInvalid()) {
		Actions.ActOnLambdaError(LambdaBeginLoc, getCurScope());
		return ExprError();
	}

	StmtResult Stmt = Actions.ActOnReturnStmt(ReturnLoc, Expr.get(), getCurScope());

	BodyScope.Exit();
	TemplateParamScope.Exit();

	if (!Stmt.isInvalid() && !TrailingReturnType.isInvalid())
		return Actions.ActOnLambdaExpr(LambdaBeginLoc, Stmt.get(), getCurScope());

	Actions.ActOnLambdaError(LambdaBeginLoc, getCurScope());
	return ExprError();
}

// Otherwise, just parse a compound statement as usual.
if (!Tok.is(tok::l_brace)) ...

This is really basic; if the token is a => instead of a {, parse an expression, then put that expression into a return statement, and then use that return statement as our lambda's body.

And it works! Lambda expressions with fat arrows are now successfully parsed as if they were regular lambdas whose body is a single return statement:

Demonstration of our new feature

Was it worth it?

Implementing this feature into Clang was definitely worth it just to get more familiar with how the code base works. However, is the feature itself a good idea at all?

I think the best way to decide if a new syntax is better or not is to look at old code which could've made use of the new syntax, and decide if the new syntax makes a big difference. Therefore, and now that I have a working compiler, I have gone through all the single-expression lambdas and replaced them with my fancy new arrow lambdas in some projects I'm working on.


Before:

std::erase_if(active_chunks_, [](Chunk *chunk) { return !chunk->isActive(); });

After:

std::erase_if(active_chunks_, [](Chunk *chunk) => !chunk->isActive());

This code deletes a chunk from a game world if the chunk isn't currently active. In my opinion, the version with the arrow function is a bit clearer, but a better solution could be C++17's std::invoke. If I understand std::invoke correctly, if C++ was to adopt std::invoke for algorithms, this code could be written like this:

std::erase_if(active_chunks_, &Chunk::isInactive);

This looks nicer, but has the disadvantage that you need to add an extra method to the class. Having both isActive and its negation isInactive as member functions just because someone might want to use it as a predicate in an algorithm sounds unfortunate. I prefer lambdas' fleixibilty.


Before:

return map(begin(worldgens_), end(worldgens_), [](auto &ptr) { return ptr.get(); });

After:

return map(begin(worldgens_), end(worldgens_), [](auto &ptr) => ptr.get());

This code maps a vector of unique pointers to raw pointers. This is yet another case where I think the arrow syntax is slightly nicer than the C++11 alternative, but this time, we could actually use the member function invocation if I changed my map function to use std::invoke:

return map(begin(worldgens_), end(worldgens_), &std::unique_ptr<Swan::WorldGen::Factory>::get);

Well, this illustrates that invoking a member function doesn't really work with overly complex types. Imagine if the type was instead something more elaborate:

return map(begin(worldgens_), end(worldgens_),
	std::unique_ptr<Swan::WorldGen<int>::Factory, Swan::WorldGen<int>::Factory::Deleter>::get);

This also happens to be unspecified behavior, because taking the address of a function in the standard library is generally not legal. From https://en.cppreference.com/w/cpp/language/extending_std:

The behavior of a C++ program is unspecified (possibly ill-formed) if it explicitly or implicitly attempts to form a pointer, reference (for free functions and static member functions) or pointer-to-member (for non-static member functions) to a standard library function or an instantiation of a standard library function template, unless it is designated an addressable function.


Before:

bool needRender = dirty || std::any_of(widgets.begin(), widgets.end(),
	[](auto &w) { return w.needRender(); });

After:

bool needRender = dirty || std::any_of(widgets.begin(), widgets.end(),
	[](auto &w) => w.needRender());

Again, the short lambda version looks a bit better to me. However, here again, we could replace the lambda with a member reference if algorithms were changed to use std::invoke:

bool needRender = dirty || std::any_of(widgets.begin(), widgets.end(),
	&WidgetContainer::needRender);

Overall, I see a short lambda syntax as a modest improvement. The biggest readability win mostly stems from the lack of that awkward ; }); at the end of an expression; foo([] => bar()) instead of foo([] { return bar(); });. It certainly breaks down a bit when the argument list is long; neither of these two lines are particularly short:

foo([](auto const &&a, auto const &&b, auto const &&c) => a + b + c);
foo([](auto const &&a, auto const &&b, auto const &&c) { return a + b + c; });

I think, considering the minimal cost of implementing this short lambda syntax, the modest improvements outweigh the added complexity. However, there's also an opportunity cost associated with my version of the short lambda syntax: it makes a better, future short lambda syntax either impossible or more challenging. For example, accepting my syntax would mean we couldn't really adopt a ratified version of P0573R2's short lambda syntax in the future, even if the issues with it were otherwise fixed.

Therefore, I will argue strongly that my syntax makes code easier to read, but I can't say anything about whether it should be standardized or not.

Aside: Corner cases

If we were to standardize this syntax, we would have to consider all kinds of corner cases, not just accept what clang with my changes happens to to do. However, I'm still curious about what exactly clang happens to do with my changes.

How does this interact with the comma operator?

The comma operator in C++ (and most C-based languages) is kind of strange. For example, what does foo(a, b) do? We know it calls foo with the arguments a and b, but you could technically decide to parse it as foo(a.operator,(b)).

My arrow syntax parses foo([] => 10, 20) as calling foo with one argument; a function with the body 10, 20 (where the comma operator means the 10 does nothing, and 20 is returned). I would probably want that to be changed, so that foo is called with two arguments; a lambda and an int.

This turns out to be fairly easy to fix, because Clang already has ways of dealing with expressions which can't include top-level commas. After all, there's precedence here; since clang parses foo(10, 20) without interpreting the , a top-level comma as a comma operator, we can use the same infrastructure for arrow lambdas.

In clang/lib/Parser/ParseExpr.cpp, Clang defines a function ParseAssignmentExpression, which has this comment:

Parse an expr that doesn't include (top-level) commas.

Calling ParseAssignmentExpression is also the first thing the ParseExpression function does. It seems like it's just the general function for parsing an expression without a top-level comma operator, even though the name is somewhat misleading. This patch changes arrow lambdas to use ParseAssignmentExpression instead of ParseExpression: https://github.com/mortie/llvm/commit/c653318c0056d06a512dfce0799b66032edbed4c

How do immediately invoked lambdas work?

With C++11 lambdas, you can write an immediately invoked lambda in the obvious way; just do [] { return 10; }(), and that expression will return 10. With my arrow lambda syntax, it's not quite as obvious. Would [] => foo() be interpreted as immediately invoking the lambda [] => foo, or would it be interpreted as creating a lambda whose body is foo()?

In my opinion, the only sane way for arrow lambdas to work would be that [] => foo() creates a lambda with the body foo(), and that creating an immediately invoked lambda would require extra parens; ([] => foo())(). That's also how my implementation happens to work.

How does this interact with with explicit return types, specifiers, etc?

Since literally all the code before the arrow/opening brace is shared between arrow lambdas and C++11 lambdas, everything should work exactly the same. That means that all of these statements should work:

auto l1 = []() -> long => 10;
auto l2 = [foo]() mutable => foo++;
auto l3 = [](auto a, auto b) noexcept(noexcept(a + b)) => a + b;

And so would any other combination of captures, template params, params, specifiers, attributes, constraints, etc, except that the body has to be a single expression.

Read More

C compiler quirks I have encountered

Date: 2018-07-26
Git: https://gitlab.com/mort96/blog/blob/published/content/00000-home/00011-c-compiler-quirks.md

In a previous blog post, I wrote about some weird features of C, the C preprocessor, and GNU extensions to C that I used in my testing library, Snow.

This post will be about some of the weird compiler and language quirks, limitations, and annoyances I've come across. I don't mean to bash compilers or the specification; most of these quirks have good technical or practical reasons.

Compilers lie about what version of the standard they support

There's a handy macro, called __STDC_VERSION__, which describes the version of the C standard your C implementation conforms to. We can check #if (__STDC_VERSION__ >= 201112L) to check if our C implementaion confirms to C11 or higher (C11 was published in December 2011, hence 2011 12). That's really useful if, say, you're a library author and have a macro which uses _Generics, but also have alternative ways of doing the same and want to warn people when they use the C11-only macro in an older compiler.

In theory, this should always work; any implementation of C which conforms to all of C11 will define __STDC_VERSION__ as 201112L, while any implementation which doesn't conform to C11, but conforms to some earlier version, will define __STDC_VERSION__ to be less than 201112L. Therefore, unless the _Generic feature gets removed in a future version of the standard, __STDC_VERSION__ >= 201112L means that we can safely use _Generic.

Sadly, the real world is not that clean. You could already in GCC 4.7 enable C11 by passing in -std=c11, which would set __STDC_VERSION__ to 201112L, but the first release to actually implement all non-optional features of C11 was GCC 4.9. That means, if we just check the value of __STDC_VERSION__, users on GCC 4.7 and GCC 4.8 who use -std=c11 will see really confusing error messages instead of our nice error message. Annoyingly, GCC 4.7 and 4.8 happens to still be extremely widespread versions of GCC. (Relevant: GCC Wiki's C11Status page)

The solution still seems relatively simple; just don't use -std=c11. More recent compilers default to C11 anyways, and there's no widely used compiler that I know of which will default to setting __STDC_VERSION__ to C11 without actually supporting all of C11. That works well enough, but there's one problem: GCC 4.9 supports all of C11 just fine, but only if we give it -std=c11. GCC 4.9 also seems to be one of those annoyingly widespread versions of GCC, so we'd prefer to encourage users to set -std=c11 and make the macros which rely on _Generic work in GCC 4.9.

Again, the solution seems obvious enough, if a bit ugly: if the compiler is GCC, we only use _Genric if the GCC version is 4.9 or greater and __STDC_VERSION__ is C11. If the compiler is not GCC, we just trust it if it says it supports C11. This should in theory work perfectly:

#if (__STDC_VERSION__ >= 201112L)
# ifdef __GNUC__
#  if (__GNUC__ >= 5 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 9))
#   define IS_C11
#  endif
# else
#  define IS_C11
# endif
#endif

Our new IS_C11 macro should now always be defined if we can use _Generic and always not be defined when we can't use _Generic, right?

Wrong. It turns out that in their quest to support code written for GCC, Clang also defines the __GNUC__, __GNUC_MINOR__, and __GNUC_PATCHLEVEL__ macros, specifically to fool code which checks for GCC into thinking Clang is GCC. However, it doesn't really go far enough; it defines the __GNUC_* variables to correspond to the the version of clang, not the version of GCC which Clang claims to imitate. Clang gained support for C11 in 3.6, but using our code, we would conclude that it doesn't support C11 because __GNUC__ is 3 and __GNUC_MINOR__ is 6. Update: it turns out that Clang always pretends to be GCC 4.2, but the same issue still applies; __GNUC__ is 4, and __GNUC_MINOR__ is 2, so it fails our version check. We can solve this by adding a special case for when __clang__ is defined:

#if (__STDC_VERSION__ >= 201112L)
# if defined(__GNUC__) && !defined(__clang__)
#  if (__GNUC__ >= 5 || (__GNUC__ == 4 && __GNUC_MINOR__ >= 9))
#   define IS_C11
#  endif
# else
#  define IS_C11
# endif
#endif

Now our code works with both Clang and with GCC, and should work with all other compilers which don't try to immitate GCC - but for every compiler which does immitate GCC, we would have to add a new special case. This is starting to smell a lot like user agent strings.

The Intel compiler is at least nice enough to define __GNUC__ and __GNUC_MINOR__ according to be the version of GCC installed on the system; so even though our version check is completely irrelevant in the Intel compiler, at least it will only prevent an otherwise C11-compliant Intel compiler from using _Generic if the user has an older version of GCC installed.

User: Hi, I'm using the Intel compiler, and your library claims my compiler doesn't support C11, even though it does.

You: Upgrading GCC should solve the issue. What version of GCC do you have installed?

User: ...but I'm using the Intel compiler, not GCC.

You: Still, what version of GCC do you have?

User: 4.8, but I really don't see how that's relevant...

You: Try upgrading GCC to at least version 4.9.

(Relevant: Intel's Additional Predefined Macros page)

_Pragma in macro arguments

C has had pragma directives for a long time. It's a useful way to tell our compiler something implementation-specific; something which there's no way to say using only standard C. For example, using GCC, we could use a pragma directive to tell our compiler to ignore a warning for a couple of lines, without changing warning settings globally:

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wfloat-equal"
// my_float being 0 indicates a horrible failure case.
if (my_float == 0)
	abort();
#pragma GCC diagnostic pop

We might also want to define a macro which outputs the above code, so C99 introduced the _Pragma operator, which works like #pragma, but can be used in macros. Once this code goes through the preprocessor, it will do exactly the same as the above code:

#define abort_if_zero(x) \
	_Pragma("GCC diagnostic push") \
	_Pragma("GCC diagnostic ignored \"-Wfloat-equal\"") \
	if (x == 0) \
		abort(); \
	_Pragma("GCC diagnostic pop")

abort_if_zero(my_float);

Now, imagine that we want a macro to trace certain lines; a macro which takes a line of code, and prints that line of code while executing the line. This code looks completely reasonable, right?

#define trace(x) \
	fprintf(stderr, "TRACE: %s\n", #x); \
	x

trace(abort_if_zero(my_float));

However, if we run that code through GCC's preprocessor, we see this mess:

#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wfloat-equal"
#pragma GCC diagnostic pop
fprintf(stderr, "TRACE: %s\n", "abort_if_zero(my_float)"); if (my_float == 0) abort();

The pragmas all got bunched up at the top! From what I've heard, this isn't against the C standard, because the standard not entirely clear on what happens when you send in _Pragma operators as macro arguments, but it sure surprised me when I encountered it nonetheless.

For the Snow library, this means that there are certain warnings which I would have loved to only disable for a few lines, but which I have to disable for all code following the #include <snow/snow.h> line.

Side note: Clang's preprocessor does exactly what one would expect, and produces this output:

fprintf(stderr, "TRACE: %s\n", "abort_if_zero(my_float)");
#pragma GCC diagnostic push
#pragma GCC diagnostic ignored "-Wfloat-equal"
 if (my_float == 0) abort();
#pragma GCC diagnostic pop

Line numbers in macro arguments

Until now, the quirks I've shown have been issues you could potentially encounter in decent, real-world code. If this quirk has caused issues for you however, it might be a sign that you're slightly over-using macros.

All testing code in Snow happens within macro arguments. This allows for what I think is a really nice looking API, and allows all testing code to be disabled just by changing one macro definition. This is a small example of a Snow test suite:

#include <stdio.h>
#include <snow/snow.h>

describe(files, {
	it("writes to files", {
		FILE *f = fopen("testfile", "w");
		assertneq(f, NULL);
		defer(remove("testfile"));
		defer(fclose(f));

		char str[] = "hello there";
		asserteq(fwrite(str, 1, sizeof(str), f), sizeof(str));
	});
});

snow_main();

If that assertneq or asserteq fails, we would like and expect to see a line number. Unfortunately, after the code goes through the preprocessor, the entire nested macro expansion ends up on a single line. All line number information is lost. __LINE__ just returns the number of the last line of the macro expansion, which is 14 in this case. All __LINE__ expressions inside the block we pass to describe will return the same number. I have googled around a bunch for a solution to this issue, but none of the solutions I've looked at actually solve the issue. The only actual solution I can think of is to write my own preprocessor.

Some warnings can't be disabled with pragma

Like the above example, this is probably an issue you shouldn't have come across in production code.

First, some background. In Snow, both the code which is being tested and the test cases can be in the same file. This is to make it possible to test static functions and other functionality which isn't part of the component's public API. The idea is that at the bottom of the file, after all non-testing code, one should include <snow/snow.h> and write the test cases. In a non-testing build, all the testing code will be removed by the preprocessor, because the describe(...) macro expands to nothing unless SNOW_ENABLED is defined.

My personal philosophy is that your regular builds should not have -Werror, and that your testing builds should have as strict warnings as possible and be compiled with -Werror. Your users may be using a different compiler version from you, and that compiler might produce some warnings which you haven't fixed yet. Being a user of a rolling release distro, with a very recent of GCC, I have way too often had to edit someone else's Makefile and remove -Werror just to make their code compile. Compiling the test suite with -Werror and regular builds without -Werror has none of the drawbacks of using -Werror for regular builds, and most or all of the advantages (at least if you don't accept contributions which break your test suite).

This all means that I want to be able to compile all files with at least -Wall -Wextra -Wpedantic -Werror, even if the code includes <snow/snow.h>. However, Snow contains code which produces warnings (and therefore errors) with those settings; among other things, it uses some GNU extensions which aren't actually part of the C standard.

I would like to let users of Snow compile their code with at least -Wall -Wextra -Wpedantic -Werror, but Snow has to disable at least -Wpedantic for all code after the inclusion of the library. In theory, that shouldn't be an issue, right? We just include #pragma GCC diagnostic ignored "-Wpedantic" somewhere.

Well, as it turns out, disabling -Wpedantic with a pragma doesn't disable all the warnings enabled by -Wpedantic; there are some warnings which are impossible to disable once they're enabled. One such warning is about using directives (like #ifdef) inside macro arguments. As I explained earlier, everything in Snow happens inside of macro arguments. That means that when compiling with -Wpedantic, this code produces a warning which it's impossible to disable without removing -Wpedantic from the compiler's arguments:

describe(some_component, {
#ifndef __MINGW32__
	it("does something which can't be tested on mingw", {
		/* ... */
	});
#endif
});

That's annoying, because it's perfectly legal in GNU's dialect of C. The only reason we can't do it is that it just so happens to be impossible to disable that particular warning with a pragma.

To be completely honest, this issue makes complete sense. I imagine the preprocessor stage, which is where macros are expanded, doesn't care much about pragmas. It feels unnecessary to implement pragma parsing for the preprocessor just in order to let people compile files with -Wpedantic but still selectively disable this particular warning. That doesn't make it less annoying though.

Funnily enough, I encountered this issue while writing Snow's test suite. My solution was to just define a macro called NO_MINGW which is empty if __MINGW32__ is defined, and expands to the contents of its arguments otherwise.

Read More

Some obscure C features you might not know about

Date: 2018-01-25
Git: https://gitlab.com/mort96/blog/blob/published/content/00000-home/00010-obscure-c-features.md

I have been working on Snow, a unit testing library for C. I wanted to see how close I could come to making a DSL (domain specific language) with its own syntax and features, using only the C preprocessor and more obscure C features and GNU extensions. I will not go into detail about how Snow works unless it's directly relevant, so I recommend taking a quick look at the readme on the GitHub page.

Sending blocks as arguments to macros

Let's start with the trick that's probably both the most useful in everyday code, and the least technically complicated.

Originally, I defined macros like describe, subdesc, and it similar to this:

#define describe(name, block) \
	void test_##name() { \
		/* some code, omitted for brevity */ \
		block \
		/* more code */ \
	}

The intended use would then be like this:

describe(something, {
	/* code */
});

The C preprocessor doesn't really understand the code; it only copies and pastes strings around. It splits the string between the opening ( and the closing ) by comma; that means, in this case, something would be sent in as the first argument, and { /* code */ } as the second argument (pretend /* code */ is actual code; the preprocessor actually strips out comments). The C preprocessor is smart enough to know that you might want to pass function calls to macros, and function calls contain commas, so parentheses will "guard" the commas they contain. describe(something, foo(10, 20)) would therefore pass something as the first argument, and foo(10, 20) as the second argument.

Now, we're not passing in function calls, but blocks. The preprocessor only considers parentheses; braces { } or brackets [ ] don't guard their contents. That means this call will fail:

describe(something, {
	int a, b;
	/* code */
});

The preprocessor will interpret something as the first argument, { int a as the second argument, and b; /* code */ } as the third argument, but describe only takes two arguments! The preprocessor will halt and show an error message.

So, how do we fix this? Not being able to write commas outside of parentheses in our blocks is quite the limitation. Not only does it prevent us from declaring multiple variables in one statement, it also messes with array declarations like int foo[] = { 10, 20, 30 };.

Well, the preprocessor supports variadic macros; macros which can take an unlimited amount of arguments. The way they are implemented is that any extra arguments (indicated by ... in the macro definition) are made available through the __VA_ARGS__ identifier; __VA_ARGS__ is replaced with all the extra arguments separated by commas. So, what happens if we define the macro like this?

#define describe(name, ...) \
	void test_##name() { \
		/* some code, omitted for brevity */ \
		__VA_ARGS__ \
		/* more code */ \
	}

Let's call describe like we did above:

describe(something, {
	int a, b;
	/* code */
});

Now, the arguments will be interpreted the same way as before; something will be the first argument, { int a will be the second argument, and b; /* code */ } will be the third. However, __VA_ARGS__ will be replaced by the second and third argument with a comma inbetween, and together they produce { int a, b; /* code */ }, just as we intended. The entire describe call will be expanded into this (with added newlines and indentation for clarity; the actual preprocessor would put it all on one line):

void test_something() {
	/* some code, omitted for brevity */
	{
		int a, b;
		/* code */
	}
	/* more code */
}

And just like that, we successfully passed a block of code, with unguarded commas, to a macro.

Credit for this solution goes to this stackoverflow answer.

Generic macros with _Generic

I wanted to be able to use one set of macros, asserteq and assertneq, to be able to do most simple equality checks, instead of having to write asserteq_str for strings, asserteq_int for integers, etc. The C11 standard added the _Generic keyword, which sounds like it's perfect for that; given a list of types and expressions, _Generic will choose the expression whose associated type is compatible with a controlling expression. For example, this code will print "I am an int":

_Generic(10,
	int: printf("I am an int\n"),
	char *: printf("I am a string\n")
);

By itself, _Generic isn't terribly useful, but it can be used to make faux-generic function-like macros. The cppreference.com page uses the example of a generic cbrt (cube root) macro:

#define cbrt(x) _Generic((x), \
	long double: cbrtl, \
	float: cbrtf, \
	default: cbrt)(x)

Calling cbrt on a long double will now call cbrtl, while calling cbrt on a double will call the regular cbrt function, etc. Note that _Generic is not part of the preprocessor; the preprocessor will just spit out the _Generic syntax with x replaced with the macro's argument, and it's the actual compiler's job to figure out what type the controlling expression is and choose the appropriate expression.

I have a bunch of asserteq functions for the various types; asserteq_ptr(void *a, void *b), asserteq_int(intmax_t a, intmax_t b), asserteq_str(const char *a, const char *b), etc. (In reality, the function signatures are a lot uglier, and they're prefixed with _snow_, but for the sake of this article, I'll pretend they look like void asserteq_<suffix>(<type> a, <type> b)).

At first glance, _Generic looks perfect for this use case; just define an asserteq macro like this:

#define asserteq(a, b) _Generic((b), \
	const char *: asserteq_str, \
	char *: asserteq_str, \
	void *: asserteq_ptr, \
	int: asserteq_int)(a, b)

It's sadly not that simple. _Generic will match only specific types; int matches only int, not long. void * matches void pointers, not any other form of pointer. There's no way to say "match every pointer type", for example.

However, there is a default clause, just like in switch statements. My first solution was to just pass anything not otherwise specified to asserteq_int, and use _Pragma (like #pragma, but can be used inside macros) to ignore the warnings:

#define asserteq(a, b) \
	do { \
		_Pragma("GCC diagnostic push") \
		_Pragma("GCC diagnostic ignored \"-Wint-conversion\"") \
		_Generic((b), \
			const char *: asserteq_str, \
			char *: asserteq_str, \
			default: asserteq_int)(a, b) \
		_Pragma("GCC diagnostic pop") \
	} while (0)

That solution worked but it's not exactly nice. I assume it would eventually break, either due to compiler optimizations or due to weird systems where an intmax_t is smaller than a pointer or whatever. Luckily, the good people over in ##C@freenode had an answer: subtracting a pointer from a pointer results in a ptrdiff_t! That means we can nest _Generics, and appropriately choose asserteq_int for any integer types, or asserteq_ptr for any pointer types:

#define asserteq(a, b) _Generic((b), \
	const char *: asserteq_str, \
	char *: asserteq_str, \
	default: _Generic((b) - (b), \
		ptrdiff_t: asserteq_ptr(a, b), \
		default: asserteq_int(a, b)))(a, b)

Defer, label pointers, and goto *(void *)

I once saw a demonstration of Golang's defer statement, and fell in love. It immediately struck me as a much better way to clean up than relying solely on the try/catch stuff we've been used to ever since 1985. Naturally, I wanted to use that for tearing down test cases in Snow, but there's not exactly any obvious way to implement it in C.

For those unfamiliar with it, in Go, defer is basically a way to say, "run this expression once the function returns". It works like a stack; when the function returns, the most recently deferred expression will be executed first, and the first deferred expression will be executed last. The beautiful part is that even if the function returns early, either because some steps can be skipped, or because something failed, all the appropriate deferred expressions, and only the appropriate deferred expressions, will be executed. Replace "function" with "test case", and it sounds perfect for tearing down tests.

So, how would you implement that in C? Well, it turns out that GCC has two useful non-standard extensions (which are also supported by Clang by the way): local labels, and labels as values.

Local labels are basically regular labels which you can jump to with goto, but instead of being global to the entire function, they're only available in the block they're declared in. That's fairly straightforward. You declare that a label should be block scoped by just putting __label__ label_name; at the top of the block, and then you can use label_name: anywhere within the block to actually create the label. A goto label_name from anywhere within the block will then go to the label, as expected.

Labels as values is weirder. GCC adds a new unary && operator, which gets a pointer to a label as a void *. Moreover, if you save that pointer in a variable which is accessible outside the block, you can jump back in to that block from outside of it, even though it's a local label. This will print "hello" in an infinite loop:

{
	void *somelabel;

	{
		__label__ lbl;
		lbl:
		somelabel = &&lbl;
		printf("hello\n");
	}

	goto *somelabel;
}

Yes, the somelabel is a void *. Yes, we dereference somelabel to go to it. I don't know how that works, but the important part is that it does. Other than being dereferencable, the void * we get from the unary && works exactly like any other void *, and can even be in an array. Knowing this, implementing defer isn't too hard; here's a simplified implementation of the it(description, block) macro (using the __VA_ARGS__ trick from before) which describes one test case, and the defer(expr) macro which can be used inside the it block:

#define it(description, ...) \
	do { \
		__label__ done_label; \
		void *defer_labels[32]; \
		int defer_count = 0; \
		int run_defer = 0; \
		__VA_ARGS__ \
		done_label: \
		run_defer = 1; \
		if (defer_count > 0) { \
			defer_count -= 1; \
			goto *defer_labels[defer_count]; \
		} \
	} while (0)

#define defer(expr) \
	do { \
		__label__ lbl; \
		lbl: \
		if (run_defer) { \
			expr; \
			/* Go to the previous defer, or the end of the `it` block */ \
			if (defer_count > 0) { \
				defer_count -= 1; \
				goto *defer_labels[defer_count]; \
			} else { \
				goto done_label; \
			} \
		} else { \
			defer_labels[defer_count] = &&lbl; \
			defer_count += 1; \
		} \
	} while (0)

That might not be the most understandable code you've ever seen, but let's break it down with an example.

it("whatever", {
	printf("Hello World\n");
	defer(printf("world\n"));
	defer(printf("hello "));
});

Running that through the preprocessor, we get this code:

do {
	__label__ done_label;
	void *defer_labels[32];
	int defer_count = 0;
	int run_defer = 0;

	{
		printf("Hello World\n");

		do {
			__label__ lbl;
			lbl:
			if (run_defer) {
				printf("world\n");

				/* Go to the previous defer, or the end of the `it` block */
				if (defer_count > 0) {
					defer_count -= 1;
					goto *defer_labels[defer_count];
				} else {
					goto done_label;
				}
			} else {
				defer_labels[defer_count] = &&lbl;
				defer_count += 1;
			}
		} while (0);

		do {
			__label__ lbl;
			lbl:
			if (run_defer) {
				printf("hello ");

				/* Go to the previous defer, or the end of the `it` block */
				if (defer_count > 0) {
					defer_count -= 1;
					goto *defer_labels[defer_count];
				} else {
					goto done_label;
				}
			} else {
				defer_labels[defer_count] = &&lbl;
				defer_count += 1;
			}
		} while (0);
	}

	done_label:
	run_defer = 1;
	if (defer_count > 0) {
		defer_count -= 1;
		goto *defer_labels[defer_count];
	}
} while (0);

That's still not extremely obvious on first sight, but it's at least more obvious than staring at the macro definitions. The first time through, run_defer is false, so both the defer blocks will just add their labels to the defer_labels array and increment defer_count. Then, just through normal execution (without any goto), we end up at the label called done_label, where we set run_defer to true. Because defer_count is 2, we decrement defer_count and jump to defer_labels[1], which is the last defer.

This time, because run_defer is true, we run the deferred expression printf("hello "), decrement defer_count again, and jump to defer_labels[0], which is the first defer.

The first defer runs its expression, printf("world\n"), but because defer_count is now 0, we jump back to done_label. defer_count is of course still 0, so we just exit the block.

The really nice thing about this system is that a failing assert can at any time just say goto done_label, and only the expressions which were deferred before the goto will be executed.

(Note: in the actual implementation in Snow, defer_labels is of course a dynamically allocated array which is realloc'd when necessary. It's also global to avoid an allocation and free for every single test case. I omitted that part because it's not that relevant, and would've made the example code unnecessarily complicated.)

Update: A bunch of people on Reddit and Hacker News have suggested ways to accomplish this. I ended up using the __attribute__((constructor)) function attribute, which makes a given function execute before the main function. Basically, each describe creates a function called test_##name, and a constructor function called _snow_constructor_##name whose only job is to add test_##name to a global list of functions. Here's the code: https://github.com/mortie/snow/blob/7ee25ebbf0edee519c6eb6d36b82d784b0fdcbfb/snow/snow.h#L393-L421

Automatically call all functions created by describe

The describe macro is meant to be used at the top level, outside of functions, because it creates functions. It's basically just this:

#define describe(name, ...) \
	void test_##name() { \
		__VA_ARGS__ \
	}

Calling describe(something, {}) will create a function called test_something. Currently, that function has to be called manually, because no other part of Snow knows what the function is named. If you have used the describe macro to define the functions test_foo, test_bar, and test_baz, the main function will look like this:

snow_main({
	test_foo();
	test_bar();
	test_baz();
})

I would have loved it if snow_main could just know what functions are declared by describe, and automatically call them. I will go over a couple of ways I tried, which eventually turned out to not be possible, and then one way which would definitely work, but which is a little too crazy, even for me.

Static array of function pointers

What if, instead of just declaring functions with describe, we also appended them to an array of function pointers? What if snow.h contained code like this:

void (*described_functions[512])();

#define describe(name, ...) \
	void test_##name() { \
		__VA_ARGS__ \
	} \
	described_functions[__COUNTER__] = &test_##name

__COUNTER__ is a special macro which starts at 0, and is incremented by one every time it's referenced. That means that assuming nothing else uses __COUNTER__, this solution would have worked, and would have been relatively clean, if only it was valid syntax. Sadly, you can't set the value of an index in an array like that in the top level in C, only inside functions.

Appending to a macro

What if we had a macro which we appended test_##name(); to every time a function is declared by describe? It turns out that this is almost possible using some obscure GCC extensions. I found this solution on StackOverflow:

#define described_functions test_foo();

#pragma push_macro("described_functions")
#undef described_functions
#define described_functions _Pragma("pop_macro(\"described_functions\")") described_functions test_bar();

#pragma push_macro("described_functions")
#undef described_functions
#define described_functions _Pragma("pop_macro(\"described_functions\")") described_functions test_baz();

described_functions // expands to test_foo(); test_bar(); test_baz();

This is actually a way to append a string to a macro which works, at least in GCC. Snow could have used that... except for one problem: you of course can't use #define from within a macro, and we would have needed to do this from within the describe macro. I have searched far and wide for a way, even a weird GCC-specific possibly pragma-related way, to redefine a macro from within another macro, but I haven't found anything. Close, but no cigar.

The way which actually works

I mentioned that there is actually one way to do it. Before I show you, I need to cover dlopen and dlsym.

void *dlopen(const char *filename, int flags) opens a binary (usually a shared object... usually), and returns a handle. Giving dlopen NULL as the file name gives us a handle to the main program.

void *dlsym(void *handle, const char *symbol) returns a pointer to a symbol (for example a function) in the binary which handle refers to.

We can use dlopen and dlsym like this:

#include <stdio.h>
#include <dlfcn.h>

void foo() {
	printf("hello world\n");
}

int main() {
	void *h = dlopen(NULL, RTLD_LAZY);

	void *fptr = dlsym(h, "foo");
	void (*f)() = fptr;
	f();

	dlclose(h);
}

Compile that code with gcc -Wl,--export-dynamic -ldl -o something something.c, and run ./something, and you'll see it print hello world to the terminal. That means we can actually call functions dynamically based on an arbitrary string at runtime. (The -Wl,--export-dynamic is necessary to tell the linker to export the symbols, such that they're available to us through dlsym).

Being able to run functions based on a runtime C string, combined with our friend __COUNTER__, opens up some interesting possibilities. We could write a program like this:

#include <stdio.h>
#include <dlfcn.h>

/* Annoyingly, the concat_ and concat macros are necessary to
 * be able to use __COUNTER__ in an identifier name */
#define concat_(a, b) a ## b
#define concat(a, b) concat_(a, b)

#define describe(...) \
	void concat(test_, __COUNTER__)() { \
		__VA_ARGS__ \
	}

describe({
	printf("Hello from function 0\n");
})

describe({
	printf("Hi from function 1\n");
})

int main() {
	void *h = dlopen(NULL, RTLD_LAZY);
	char symbol[32] = { '\0' };

	for (int i = 0; i < __COUNTER__; ++i) {
		snprintf(symbol, 31, "test_%i", i);
		void *fptr = dlsym(h, symbol);
		void (*f)() = fptr;
		f();
	}

	dlclose(h);
}

Run that through the preprocessor, and we get:

void test_0() {
	{ printf("Hello from function 0\n"); }
}
void test_1() {
	{ printf("Hi from function 1\n"); }
}

int main() {
	void *h = dlopen(NULL, RTLD_LAZY);
	char symbol[32] = { '\0' };

	for (int i = 0; i < 2; ++i) {
		snprintf(symbol, 31, "test_%i", i);
		void *fptr = dlsym(h, symbol);
		void (*f)() = fptr;
		f();
	}

	dlclose(h);
}

That for loop in our main function will first call test_0(), then test_1().

I hope you understand why even though this technically works, it's not exactly something I want to include in Snow ;)

Read More

Replacing Apple TV

Date: 2015-12-18
Git: https://gitlab.com/mort96/blog/blob/published/content/00000-home/00009-replacing-apple-tv.md

For a long time now, I and my family have used an Apple TV as our media PC. Not those newfangled ones with third-party games and apps, but the older generation, those with a set of pre-installed "apps" which let you access certain quarantines of content, such as Netflix, YouTube, iTunes, etc.

The Apple TV worked well enough when accessing content from those sources. The Netflix client was good, the YouTube client kind of lackluster, and the iTunes client decent enough. There various other "apps", but those went mostly unused. The main problem however, affecting basically everything on the platform, is that I live in Norway; as a result, most of the time, somewhat new TV shows or movies we want to watch simply isn't available through those sources. Often, we needed to play video files obtained through other means. This left us two options:

  1. Find a Mac, play the video file in VLC there, mirror the screen to the Apple TV. This gives us various degrees of choppy frame rate, but lets us play the video instantly after it's downloaded, and lets us use subtitles if we so desire.
  2. Spend around half an hour converting the video to mp4, and stream it to the TV with this tool. This gives smooth frame rate, but takes a while due to converting media. It also doesn't support subtitles.

One day, I decided I'd had enough. I found an old laptop, threw Linux on it, connected it to the TV, and started writing code.

Introducing MMPC

MMPC, Mort's Media PC, is the fruits of my endevours. It's designed to be controlled from afar, with a web interface. It's also written in a modular fashion, and I'll go through what each module does.

Media Streaming

https://github.com/mortie/mmpc-media-streamer

The media streamer module is the most important module. When playing a movie or episode from a TV show, we generally have a torrent containing the media file. In the past, we would download the movie from the torrent file, and then find a way to play it on the TV when it's done. This module however lets us instead either paste in a torrent link, or upload a torrent file, and it'll stream that to a VLC window which opens on the big screen. VLC also comes with a web interface, so once you start playing a video, the browser loads the VLC web interface, and you can control video playback from there.

The control panel, letting you paste a magnet link, youtube link, etc:

Control Panel

VLC playback controls:

Playback Controls

Remote Desktop

https://github.com/mortie/mmpc-remote-desktop

Sometimes, you need more than streaming torrent files. Netflix, for example, is very useful, whenever it has the content we want to watch, and the same applies to various other websites. As the computer is running a full Linux distro instead of some locked down version of iOS, it's useful to have a way to control it directly. However, it's also annoying to have a wireless keyboard and mouse connected to it constantly and use that. Therefore, I decided it would be nice to be able to remote control it from the browser.

Implementing remote desktop in the browser sounded like it would be an interesting challenge. However, it turned out to be surprisingly easy. There's already a library out there called jsmpg which basically does everything I need. It has a client to stream an mpeg stream to a canvas element, and a server to stream to the client using websockets. The server also has an HTTP server, which you can stream video to and have it appear in all connected clients. Ffmpeg can both record the screen, and output to an HTTP server.

Once I had streaming video to the client working, the rest was just listening for various events on the client (mousemove, mousedown, etc.), and send HTTP requests to an HTTP server, which then promptly runs an xdotool command, and voila, remote desktop.

Remote Desktop

Live Wallpapers

https://github.com/mortie/mmpc-wallpaper

One nice thing about the Apple TV is that it can be set to display random pictures in your photo gallery, which is very nice. However, those pictures have to be in your iCloud photo library, which is sort of problematic, considering I don't use Apple devices, and dislike that kind of platform lock-in for something as important as photos. I therefore moved everything from what we used to use for photos, a shared iCloud photo stream, over to a NAS, mounted that NAS on the media PC as a webdav volume with davfs, and wrote this module to pick a random picture every 5 seconds and set it as the wallpaper. If the random picture it picked was in portrait orientation, it will find another portrait picture, and put them side by side using imagemagick before setting it as the wallpaper.

Why not Plex/Kodi/whatever?

Media PC software already exists. However, both Plex and Kodi, to my knowledge, sort of expects you to have a media collection stored on hard drives somewhere. They excel at letting you browse and play that media, but we rarely find ourselves in a situation where that would be beneficial. Most of the time, we just want to watch a movie we haven't seen before and have had no reason to already have in a media library, and we generally decide what movie to watch shortly before watching it. Writing the software myself lets me tailor it specifically to our use case.

UPDATE: It has come to my attention that Kodi has some addons which let you stream torrent files. However, even with that, there are some things Kodi doesn't do:

  • Streaming from arbitrary video services - some services, like Netflix, have kodi addons, but many streaming services don't have such plugins. There's no plugin for daisuki for example. Just having a regular desktop with Google Chrome, and some links to websites on the desktop for easier access, solves this.
  • Having those unobstructed dynamic live wallpapers of pictures we've taken whenever video isn't playing is rather nice.
  • Being able to control from a laptop instead of a remote control is useful; remote controls get lost, laptops don't. Typing on a laptop keyboard is also a lot easier than with a remote control.

You could get many of the features I want by using Kodi by installing, and maybe writing, lots of plugins, but I'm not convinced that would've been mch easier than just writing the thousand lines of javascript this project required.

Read More