diff options
| author | JF Bastien <github@jfbastien.com> | 2015-06-09 16:40:15 +0200 |
|---|---|---|
| committer | JF Bastien <github@jfbastien.com> | 2015-06-09 16:40:15 +0200 |
| commit | 4e47a65bea188578ee41484b02a6191e79e7f0e7 (patch) | |
| tree | d803862f9fb1dded910efbeec08268769841ec31 /BinaryEncoding.md | |
| parent | d068cf28619ab4cb5902798e9ee281691b8d8f43 (diff) | |
| parent | cb9e2b52f422bce27e689d7e9f025a58d2a8bc18 (diff) | |
| download | nanowasm-design-4e47a65bea188578ee41484b02a6191e79e7f0e7.tar.gz | |
Merge pull request #115 from WebAssembly/refactor-mvp
Refactor MVP
Diffstat (limited to 'BinaryEncoding.md')
| -rw-r--r-- | BinaryEncoding.md | 84 |
1 files changed, 54 insertions, 30 deletions
diff --git a/BinaryEncoding.md b/BinaryEncoding.md index e605ea5..bb9d339 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -1,37 +1,61 @@ # Binary Encoding -This document describes the binary encoding of the AST nodes defined in the [AST semantics](AstSemantics.md). -For the context of this document, see the [Binary format](MVP.md#binary-format) section of the [MVP doc](MVP.md). - -The binary encoding is designed to allow fast startup, which includes reducing download size -and allow for quick decoding. Considering the matter of reducing download size, we can see -it as having three layers: - - * The **raw** binary encoding itself, natively decoded by the browser, and to be standardized in - the MVP of the spec. - * **Specific** compression to the binary encoding, that is unreasonable to expect a generic - compression algorithm like gzip to achieve. - * This is not meant to be standardized, at least not initially, as it can be done with a - downloaded decompressor that runs as web content on the client, and in particular - can be implemented in a polyfill. Not standardizing it leaves the binary - encoding as the only thing a WebAssembly implementation is required to implement, - which is nice. However, if the benefits are shown to be substantial, this will be - reconsidered in the future. - * We can do better than generic compression because we are aware of the AST structure - and other details: - * For example, macro compression that [deduplicates AST trees](https://github.com/WebAssembly/spec/issues/58#issuecomment-101863032) - can focus on AST nodes + their children, thus having `O(nodes)` entities to worry about, compared - to generic compression which in principle would need to look at `O(bytes*bytes)` entities. - Such macros would allow the logical equivalent of `#define ADD1(x) (x+1)`, i.e., - to be parametrized. Simpler macros (`#define ADDX1 (x+1)`) can implement useful +This document describes the binary encoding of the +[Abstract Syntax Tree](AstSemantics.md) nodes. + +The binary encoding is designed to allow fast startup, which includes reducing +download size and allow for quick decoding. + +Reducing download size, is achieved through three layers: + + * The **raw** binary encoding itself, natively decoded by the browser, and to + be standardized in the [MVP](MVP.md). + * **Specific** compression to the binary encoding, that is unreasonable to + expect a generic compression algorithm like gzip to achieve. + * This is not meant to be standardized, at least not initially, as it can be + done with a downloaded decompressor that runs as web content on the client, + and in particular can be implemented in a [polyfill](Polyfill.md). + * We can do better than generic compression because we are aware of the AST + structure and other details: + * For example, macro compression that + [deduplicates AST trees](https://github.com/WebAssembly/spec/issues/58#issuecomment-101863032) + can focus on AST nodes + their children, thus having `O(nodes)` entities + to worry about, compared to generic compression which in principle would + need to look at `O(bytes*bytes)` entities. Such macros would allow the + logical equivalent of `#define ADD1(x) (x+1)`, i.e., to be + parametrized. Simpler macros (`#define ADDX1 (x+1)`) can implement useful features like constant pools. - * Another example is reordering of functions and some internal nodes, which we know - does not change semantics, but + * Another example is reordering of functions and some internal nodes, which + we know does not change semantics, but [can improve general compression](http://www.rfk.id.au/blog/entry/cromulate-improve-compressibility/). - * **Generic** compression, such as gzip, already supported in browsers, or LZMA and other - compression algorithms, which might be standardized as well. + * **Generic** compression, such as gzip, already supported in browsers, or LZMA + and other compression algorithms, which might be standardized as well. + +Each of the three layers work to find compression opportunities to the best of +its abilities, without encroaching upon the subsequent layer's compression +opportunities. + +## Why a binary encoding instead of a text-only representation -Each of the three layers work to find compression opportunities to the best of its abilities, without encroaching upon the subsequent layer's compression opportunities. +Given that text is so compressible and it is well known that it is hard to beat +gzipped source, is there any win from having a binary format over a text format? +Yes: +* Large reductions in payload size can still significantly decrease the + compressed file size. + * Experimental results from a + [polyfill prototype](https://github.com/WebAssembly/polyfill) show the + gzipped binary format to be about 20-30% smaller than the corresponding + gzipped asm.js. +* A binary format that represents the names of variables and functions with raw + indices instead of strings is much faster to decode: array indexing + vs. dictionary lookup. + * Experimental results from a + [polyfill prototype](https://github.com/WebAssembly/polyfill) show that + decoding the binary format is about 23× faster than parsing the + corresponding asm.js source (using + [this demo](https://github.com/lukewagner/AngryBotsPacked), comparing + *just* parsing in SpiderMonkey (no validation, IR generation) to *just* + decoding in the polyfill (no asm.js code generation). ## Building blocks @@ -101,7 +125,7 @@ conflict-avoidance practices surrounding string names: that modules could optionally use as a starting point to further refine. * For example, to use all of [the MVP](MVP.md) plus [SIMD](EssentialPostMVPFeatures.md#fixed-width-simd) the declaration could be "base" followed by the list of SIMD opcodes used. - * This feature would also be most useful for people handwriting the [text format](MVP.md#text-format). + * This feature would also be most useful for people handwriting the [text format](TextFormat.md). * However, such a version declaration does not establish a global "version" for the module or affect anything outside of the initialization of the index spaces; decoders would remain versionless and simply add cases for new *names* (as with current JS parsers). |
