diff options
| author | JF Bastien <jfb@chromium.org> | 2015-06-08 17:38:49 +0200 |
|---|---|---|
| committer | JF Bastien <jfb@chromium.org> | 2015-06-08 17:38:49 +0200 |
| commit | 9329862be20ddb6da38e28519871910c2680af1d (patch) | |
| tree | 816e0a891316271fdb0b82531abeae649f559d55 /BinaryEncoding.md | |
| parent | 0b269f854d28e189cff9057a6d2441325b678ecd (diff) | |
| download | nanowasm-design-9329862be20ddb6da38e28519871910c2680af1d.tar.gz | |
Binary format.
Diffstat (limited to 'BinaryEncoding.md')
| -rw-r--r-- | BinaryEncoding.md | 82 |
1 files changed, 53 insertions, 29 deletions
diff --git a/BinaryEncoding.md b/BinaryEncoding.md index e605ea5..ad0c80c 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -1,37 +1,61 @@ # Binary Encoding -This document describes the binary encoding of the AST nodes defined in the [AST semantics](AstSemantics.md). -For the context of this document, see the [Binary format](MVP.md#binary-format) section of the [MVP doc](MVP.md). - -The binary encoding is designed to allow fast startup, which includes reducing download size -and allow for quick decoding. Considering the matter of reducing download size, we can see -it as having three layers: - - * The **raw** binary encoding itself, natively decoded by the browser, and to be standardized in - the MVP of the spec. - * **Specific** compression to the binary encoding, that is unreasonable to expect a generic - compression algorithm like gzip to achieve. - * This is not meant to be standardized, at least not initially, as it can be done with a - downloaded decompressor that runs as web content on the client, and in particular - can be implemented in a polyfill. Not standardizing it leaves the binary - encoding as the only thing a WebAssembly implementation is required to implement, - which is nice. However, if the benefits are shown to be substantial, this will be - reconsidered in the future. - * We can do better than generic compression because we are aware of the AST structure - and other details: - * For example, macro compression that [deduplicates AST trees](https://github.com/WebAssembly/spec/issues/58#issuecomment-101863032) - can focus on AST nodes + their children, thus having `O(nodes)` entities to worry about, compared - to generic compression which in principle would need to look at `O(bytes*bytes)` entities. - Such macros would allow the logical equivalent of `#define ADD1(x) (x+1)`, i.e., - to be parametrized. Simpler macros (`#define ADDX1 (x+1)`) can implement useful +This document describes the binary encoding of the +[Abstract Syntax Tree](AstSemantics.md) nodes. + +The binary encoding is designed to allow fast startup, which includes reducing +download size and allow for quick decoding. + +Reducing download size, is achieved through three layers: + + * The **raw** binary encoding itself, natively decoded by the browser, and to + be standardized in the [MVP](MVP.md). + * **Specific** compression to the binary encoding, that is unreasonable to + expect a generic compression algorithm like gzip to achieve. + * This is not meant to be standardized, at least not initially, as it can be + done with a downloaded decompressor that runs as web content on the client, + and in particular can be implemented in a [polyfill](Polyfill.md). + * We can do better than generic compression because we are aware of the AST + structure and other details: + * For example, macro compression that + [deduplicates AST trees](https://github.com/WebAssembly/spec/issues/58#issuecomment-101863032) + can focus on AST nodes + their children, thus having `O(nodes)` entities + to worry about, compared to generic compression which in principle would + need to look at `O(bytes*bytes)` entities. Such macros would allow the + logical equivalent of `#define ADD1(x) (x+1)`, i.e., to be + parametrized. Simpler macros (`#define ADDX1 (x+1)`) can implement useful features like constant pools. - * Another example is reordering of functions and some internal nodes, which we know - does not change semantics, but + * Another example is reordering of functions and some internal nodes, which + we know does not change semantics, but [can improve general compression](http://www.rfk.id.au/blog/entry/cromulate-improve-compressibility/). - * **Generic** compression, such as gzip, already supported in browsers, or LZMA and other - compression algorithms, which might be standardized as well. + * **Generic** compression, such as gzip, already supported in browsers, or LZMA + and other compression algorithms, which might be standardized as well. + +Each of the three layers work to find compression opportunities to the best of +its abilities, without encroaching upon the subsequent layer's compression +opportunities. + +## Why a binary encoding instead of a text-only representation -Each of the three layers work to find compression opportunities to the best of its abilities, without encroaching upon the subsequent layer's compression opportunities. +Given that text is so compressible and it is well known that it is hard to beat +gzipped source, is there any win from having a binary format over a text format? +Yes: +* Large reductions in payload size can still significantly decrease the + compressed file size. + * Experimental results from a + [polyfill prototype](https://github.com/WebAssembly/polyfill) show the + gzipped binary format to be about 20-30% smaller than the corresponding + gzipped asm.js. +* A binary format that represents the names of variables and functions with raw + indices instead of strings is much faster to decode: array indexing + vs. dictionary lookup. + * Experimental results from a + [polyfill prototype](https://github.com/WebAssembly/polyfill) show that + decoding the binary format is about 23× faster than parsing the + corresponding asm.js source (using + [this demo](https://github.com/lukewagner/AngryBotsPacked), comparing + *just* parsing in SpiderMonkey (no validation, IR generation) to *just* + decoding in the polyfill (no asm.js code generation). ## Building blocks |
