From 05436f1359a3d16559ee4032c9fb5bc98ab7ded4 Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Thu, 30 Apr 2015 21:41:30 -0500 Subject: Start working on BinaryEncoding.md Just throwing up an outline, still a WIP. --- BinaryEncoding.md | 71 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) create mode 100644 BinaryEncoding.md (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md new file mode 100644 index 0000000..bde1d5d --- /dev/null +++ b/BinaryEncoding.md @@ -0,0 +1,71 @@ +# Binary Encoding + +This document describes the binary encoding of the AST nodes defined in the [AST semantics](AstSemantics.md). +For the context of this document, see the [Binary format](V1.md#binary-format) section of the [v.1 doc](V1.md). + +*The current [polyfill](https://github.com/WebAssembly/polyfill) is a prototype and thus incomplete and not +representative of an actual WebAssembly binary format. However, there are a few key design choices in the +prototype format that significantly and demonstrably impact size and decode speed that are described below +as a starting point for defining a real standard binary format.* + +## Building blocks + +### Variable-length integers + +* LEB, except only for a uint32_t: explain + +### Serialized opcode + +* Currently, a single byte +* Reserve a bit to allow variable-length opcodes? No, just reserve a single 8-bit value: explain. + +### Constant pool + +* Table of variable length integers, fixed-width floats/doubles. +* Still want immediate-literals because many single-use constants. +* Use heuristic to pick. + +## Global structure + +* A module contains: + * a header followed by + * a table containing, for each section, its type, offset (within the module), and byte length, followed by + * a sequence of sections. +* A section contains: + * a header followed by + * the section contents (specific to the section type) +* A code section contains: + * the generic section header + * a table containing, for each function, it's signature, offset (within the seciton), sorted by offset, followed by + * a sequence of functions +* A function contains: + * a table containing, for each type, how many locals are indexed by the function body of that type + * the serialized AST + +## Serialized AST + +* Preorder + * Good for efficient single-pass validation+compilation: explain. + * Good for polyfill: explain. + * Allows type-segregated index spaces + * Basic scheme: parent opcode, immediates (variable length), children + * Variations: variable arity, ... +* Type-segregated index spaces + * Take advantage of type inherent in parent nodes. + * Keep index spaces small. + * Good for immediate folding: need tiny index space to make room for immediate. +* Immediate folding. + * 30-40% of bytes are get/setloc: explain why + * Massive size reduction by folding immediate into these ops + * Another 10% are literals, so another significant reduction + +## Example of serialized AST + +Show several example ASTs as SExprs, then serialization. + +## Macro layer + +Idea (not in prototype): can we have a macro layer on top of raw serialized AST? +The idea is to take advantage of program semantics to reduce redundancy that a +generic compression algorithm would miss because of differences in names (which a macro +can be parameterized over). -- cgit v1.2.3 From 7f16bc2903bc7c99be4d9258b1222eaddf06f61a Mon Sep 17 00:00:00 2001 From: Luke Wagner Date: Fri, 1 May 2015 18:38:17 -0500 Subject: Update BinaryEncoding.md --- BinaryEncoding.md | 89 ++++++++++++++++++++++++++++++++----------------------- 1 file changed, 52 insertions(+), 37 deletions(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index bde1d5d..77154b1 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -11,19 +11,40 @@ as a starting point for defining a real standard binary format.* ## Building blocks ### Variable-length integers - -* LEB, except only for a uint32_t: explain - -### Serialized opcode - -* Currently, a single byte -* Reserve a bit to allow variable-length opcodes? No, just reserve a single 8-bit value: explain. + * 31% size reduction before compression, 7% size reduction after compression. + * [LEB128](http://en.wikipedia.org/wiki/LEB128) except limited to uint32_t payloads. ### Constant pool - -* Table of variable length integers, fixed-width floats/doubles. -* Still want immediate-literals because many single-use constants. -* Use heuristic to pick. + * 7% size reduction before compression, 2% size reduction after compression. + * Use a module-level constant pool for i32, f32, f64 literals. + * Still want to allow literals as immediates if: + * there is only one use of the literal + * the variable-length-encoded literal pool index is bigger than the variable-length-encoded literal. + * otherwise, the constant pool is only 5.6% win uncompressed / 0% win compressed. + +### Context-dependent index spaces + * With a pre-order encoding, a node is always encountered in a *context*: statement, i32-producing + expression, f32-producing expression, etc. + * Different contexts can reuse the same indices since there is never an ambiguity. + * For example, I32::Add can have the same index as F32::Add. + * This avoids opcodes spilling over into requiring >1 bytes. + * When SIMD is added, new opcodes would mostly be in new type contexts, so still fit in a single byte. + * This also enables the next technique: + +### Fold immediates into the opcodes + * 26% size reduction before compression, 5% size reduction after compression. + * In the opcode byte: + * High bit determines whether it is a normal op or an op-with-immediate. + * If normal op: the remaining 7 bits are the opcode + * If op-with-immediate: the next 2 bits are the opcode, the remaining 5 are the immediate + * Current set of ops-with-immediate: Stmt::{SetLoc,SetGlo}, {I32,F32,F64}::{GetLoc,LitPool,LitImm} + * How more than 4? Context-dependent index spaces lets, e.g., I32WithImm overlap F32WithImm. + * Just GetLoc+SetLoc accounts for 30-40% of total bytes in a module, and each has an immediate. + * Isn't this high? No, it's actually lower than a normal register-based bytecode which has to + spend bytes on input/output operands for *every* instruction. + * 98% of functions have < 32 variables so 5 bits of immediate is a good fit. + * Literals (pool and immediate) account for ~10% of bytes. + * Sorting constant pool by use count allows 50% of LitPool ops to use a folded immediate. ## Global structure @@ -43,29 +64,23 @@ as a starting point for defining a real standard binary format.* * the serialized AST ## Serialized AST - -* Preorder - * Good for efficient single-pass validation+compilation: explain. - * Good for polyfill: explain. - * Allows type-segregated index spaces - * Basic scheme: parent opcode, immediates (variable length), children - * Variations: variable arity, ... -* Type-segregated index spaces - * Take advantage of type inherent in parent nodes. - * Keep index spaces small. - * Good for immediate folding: need tiny index space to make room for immediate. -* Immediate folding. - * 30-40% of bytes are get/setloc: explain why - * Massive size reduction by folding immediate into these ops - * Another 10% are literals, so another significant reduction - -## Example of serialized AST - -Show several example ASTs as SExprs, then serialization. - -## Macro layer - -Idea (not in prototype): can we have a macro layer on top of raw serialized AST? -The idea is to take advantage of program semantics to reduce redundancy that a -generic compression algorithm would miss because of differences in names (which a macro -can be parameterized over). +* Use a preorder encoding of the AST + * Efficient single-pass validation+compilation and polyfill + * Allows context-dependent index spaces (described above) +* The data of a node (if there is any), is written immediately after the opcode and before child nodes + * The opcode statically determines what follows, so no generic metadata is necessary. +* Examples + * Given a simple AST node: `struct I32Add { AstNode *left, *right; }` + * First write the opcode of `I32Add` (1 byte) + * Then recursively write the left and right nodes. + * Given a call AST node: `struct Call { uint32_t callee; vector args; }` + * First write the opcode of `Call` (1 byte) + * Then write the (variable-length) integer `Call::callee` (1-5 bytes) + * Then recursively write each arg node (arity is determined by looking up `callee` in table of signatures) + * Given an AST node with foldable immediate: `struct GetLoc { uint32_t index; }` + * If `GetLoc::index` < 32, write a single byte as described [above](BinaryEncoding.md#fold-immediates-into-opcodes). + * Otherwise, write the opcode of `GetLoc` and the variable length integer `GetLoc::index`. + +## Further ideas +* Macro layer on top of serialized AST (allow the logical equivalent of `#define ADD1(x) (x+1)`) + * A simple variant would be just having nullary macros (`#define ADDX1 (x+1)`) -- cgit v1.2.3