From 05436f1359a3d16559ee4032c9fb5bc98ab7ded4 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 30 Apr 2015 21:41:30 -0500 Subject: Start working on BinaryEncoding.md Just throwing up an outline, still a WIP. --- BinaryEncoding.md | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) create mode 100644 BinaryEncoding.md (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md new file mode 100644 index 0000000..bde1d5d --- /dev/null +++ b/BinaryEncoding.md @@ -0,0 +1,71 @@ +# Binary Encoding + +This document describes the binary encoding of the AST nodes defined in the [AST semantics](AstSemantics.md). +For the context of this document, see the [Binary format](V1.md#binary-format) section of the [v.1 doc](V1.md). + +*The current [polyfill](https://github.com/WebAssembly/polyfill) is a prototype and thus incomplete and not +representative of an actual WebAssembly binary format. However, there are a few key design choices in the +prototype format that significantly and demonstrably impact size and decode speed that are described below +as a starting point for defining a real standard binary format.* + +## Building blocks + +### Variable-length integers + +* LEB, except only for a uint32_t: explain + +### Serialized opcode + +* Currently, a single byte +* Reserve a bit to allow variable-length opcodes? No, just reserve a single 8-bit value: explain. + +### Constant pool + +* Table of variable length integers, fixed-width floats/doubles. +* Still want immediate-literals because many single-use constants. +* Use heuristic to pick. + +## Global structure + +* A module contains: + * a header followed by + * a table containing, for each section, its type, offset (within the module), and byte length, followed by + * a sequence of sections. +* A section contains: + * a header followed by + * the section contents (specific to the section type) +* A code section contains: + * the generic section header + * a table containing, for each function, it's signature, offset (within the seciton), sorted by offset, followed by + * a sequence of functions +* A function contains: + * a table containing, for each type, how many locals are indexed by the function body of that type + * the serialized AST + +## Serialized AST + +* Preorder + * Good for efficient single-pass validation+compilation: explain. + * Good for polyfill: explain. + * Allows type-segregated index spaces + * Basic scheme: parent opcode, immediates (variable length), children + * Variations: variable arity, ... +* Type-segregated index spaces + * Take advantage of type inherent in parent nodes. + * Keep index spaces small. + * Good for immediate folding: need tiny index space to make room for immediate. +* Immediate folding. + * 30-40% of bytes are get/setloc: explain why + * Massive size reduction by folding immediate into these ops + * Another 10% are literals, so another significant reduction + +## Example of serialized AST + +Show several example ASTs as SExprs, then serialization. + +## Macro layer + +Idea (not in prototype): can we have a macro layer on top of raw serialized AST? +The idea is to take advantage of program semantics to reduce redundancy that a +generic compression algorithm would miss because of differences in names (which a macro +can be parameterized over). -- cgit v1.2.3