BinaryEncoding.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124

# Binary Encoding

This document describes the binary encoding of the AST nodes defined in the [AST semantics](AstSemantics.md). 
For the context of this document, see the [Binary format](V1.md#binary-format) section of the [v.1 doc](V1.md).

*The current [polyfill](https://github.com/WebAssembly/polyfill) is a prototype and thus incomplete and not 
representative of an actual WebAssembly binary format. However, there are a few key design choices in the 
prototype format that significantly and demonstrably impact size and decode speed that are described below 
as a starting point for defining a real standard binary format.*

## Building blocks

### Variable-length integers
 * 31% size reduction before compression, 7% size reduction after compression.  
 * [LEB128](http://en.wikipedia.org/wiki/LEB128) except limited to uint32_t payloads.

### Constant pool
 * 7% size reduction before compression, 2% size reduction after compression.
 * Use a module-level constant pool for i32, f32, f64 literals.
 * Still want to allow literals as immediates if:
   * there is only one use of the literal
   * the variable-length-encoded literal pool index is bigger than the variable-length-encoded literal.
   * otherwise, the constant pool is only 5.6% win uncompressed / 0% win compressed.

### Context-dependent index spaces
 * With a pre-order encoding, a node is always encountered in a *context*: statement, i32-producing
   expression, f32-producing expression, etc.
 * Different contexts can reuse the same indices since there is never an ambiguity.
   * For example, I32::Add can have the same index as F32::Add.
   * This avoids opcodes spilling over into requiring >1 bytes.
   * When SIMD is added, new opcodes would mostly be in new type contexts, so still fit in a single byte.
 * This also enables the next technique:

### Fold immediates into the opcodes
 * 26% size reduction before compression, 5% size reduction after compression.
 * In the opcode byte:
   * High bit determines whether it is a normal op or an op-with-immediate.
     * If normal op: the remaining 7 bits are the opcode
     * If op-with-immediate: the next 2 bits are the opcode, the remaining 5 are the immediate
 * Current set of ops-with-immediate: Stmt::{SetLoc,SetGlo}, {I32,F32,F64}::{GetLoc,LitPool,LitImm}
   * How more than 4?  Context-dependent index spaces lets, e.g., I32WithImm overlap F32WithImm.
 * Just GetLoc+SetLoc accounts for 30-40% of total bytes in a module, and each has an immediate.
   * Isn't this high?  No, it's actually lower than a normal register-based bytecode which has to
     spend bytes on input/output operands for *every* instruction.
   * 98% of functions have < 32 variables so 5 bits of immediate is a good fit.
 * Literals (pool and immediate) account for ~10% of bytes.
   * Sorting constant pool by use count allows 50% of LitPool ops to use a folded immediate.

## Global structure

* A module contains:
  * a header followed by
  * a table containing, for each section, its type, offset (within the module), and byte length, followed by
  * a sequence of sections.
* A section contains:
  * a header followed by
  * the section contents (specific to the section type)
* A code section contains:
  * the generic section header
  * a table containing, for each function, it's signature, offset (within the seciton), sorted by offset, followed by
  * a sequence of functions
* A function contains:
  * a table containing, for each type, how many locals are indexed by the function body of that type
  * the serialized AST

## Serialized AST
* Use a preorder encoding of the AST
  * Efficient single-pass validation+compilation and polyfill
  * Allows context-dependent index spaces (described above)
* The data of a node (if there is any), is written immediately after the opcode and before child nodes
  * The opcode statically determines what follows, so no generic metadata is necessary.
* Examples
  * Given a simple AST node: `struct I32Add { AstNode *left, *right; }`
    * First write the opcode of `I32Add` (1 byte)
    * Then recursively write the left and right nodes.
  * Given a call AST node: `struct Call { uint32_t callee; vector<AstNode*> args; }`
    * First write the opcode of `Call` (1 byte)
    * Then write the (variable-length) integer `Call::callee` (1-5 bytes)
    * Then recursively write each arg node (arity is determined by looking up `callee` in table of signatures)
  * Given an AST node with foldable immediate: `struct GetLoc { uint32_t index; }`
    * If `GetLoc::index` < 32, write a single byte as described [above](BinaryEncoding.md#fold-immediates-into-opcodes).
    * Otherwise, write the opcode of `GetLoc` and the variable length integer `GetLoc::index`.

## Backwards Compatibility

Restating the relevant [high-level goal](HighLevelGoals.md): "Design to maintain the versionless, 
feature-testing and backwards-compatible evolution story of the web; engines should not need 
multiple, versioned decoders".

As explained above, for size- and decode-efficiency, the binary format will serialize AST nodes,
their contents and children using dense integer indices and without any kind of embedded metadata 
or tagging. This raises the question of how to reconcile the efficient encoding with the 
backwards-compatibility goals.

Specifically, we'd like to avoid the situation where a future version of WebAssembly has features 
F1 and F2 and vendor V1 implements F1, assigning the next logical opcode indices to F1's new 
opcodes, and V2 implements F2, assigning the same next logical opcode indices to F2's new opcodes 
and now a single binary has ambiguous semantics if it tries to use either F1 or F2. This type of 
non-linear feature addition is commonplace in JS and Web APIs and is guarded against by 
having unique names for unique features (and associated [conventions](https://hsivonen.fi/vendor-prefixes)).

The current proposal is to maintain both the efficiency of indices in the [serialized AST](BinaryEncoding.md#serialized-ast) and the established
conflict-avoidance practices surrounding string names:
  * The WebAssembly spec doesn't define any global index spaces
    * So, as a general rule, no magic numbers in the spec (other than the literal [magic number](http://en.wikipedia.org/wiki/Magic_number_%28programming%29)).
  * Instead, a module defines its *own* local index spaces of opcodes by providing tables *of names*. 
    * So what the spec *would* define is a set of names and their associated semantics.
  * If the implementation encounters a name it doesn't implement, by default an error is thrown while loading.
    * However, a name *may* include a corresponding polyfill function (identified by index 
      into the function array) to be called if the name isn't natively implemented. (There are a lot
      more details to figure out here.)
  * To avoid (over time) large index-space declaration sections that are largely the same
    between modules, finalized versions of standards would define named baseline index spaces
    that modules could optionally use as a starting point to further refine.
    * For example, to use all of [v.1](V1.md) plus [SIMD](EssentialPostV1Features.md#fixed-width-simd)
      the declaration could be "v1" followed by the list of SIMD opcodes used.
    * This feature would also be most useful for people handwriting the [text format](V1.md#text-format).
    * However, such a version declaration does not establish a global "version" for the module
      or affect anything outside of the initialization of the index spaces; decoders would
      remain versionless and simply add cases for new *names* (as with current JS parsers).

## Further ideas
* Macro layer on top of serialized AST (allow the logical equivalent of `#define ADD1(x) (x+1)`)
  * A simple variant would be just having nullary macros (`#define ADDX1 (x+1)`)