From 9329862be20ddb6da38e28519871910c2680af1d Mon Sep 17 00:00:00 2001 From: JF Bastien Date: Mon, 8 Jun 2015 17:38:49 +0200 Subject: Binary format. --- BinaryEncoding.md | 82 +++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 53 insertions(+), 29 deletions(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index e605ea5..ad0c80c 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -1,37 +1,61 @@ # Binary Encoding -This document describes the binary encoding of the AST nodes defined in the [AST semantics](AstSemantics.md). -For the context of this document, see the [Binary format](MVP.md#binary-format) section of the [MVP doc](MVP.md). - -The binary encoding is designed to allow fast startup, which includes reducing download size -and allow for quick decoding. Considering the matter of reducing download size, we can see -it as having three layers: - - * The **raw** binary encoding itself, natively decoded by the browser, and to be standardized in - the MVP of the spec. - * **Specific** compression to the binary encoding, that is unreasonable to expect a generic - compression algorithm like gzip to achieve. - * This is not meant to be standardized, at least not initially, as it can be done with a - downloaded decompressor that runs as web content on the client, and in particular - can be implemented in a polyfill. Not standardizing it leaves the binary - encoding as the only thing a WebAssembly implementation is required to implement, - which is nice. However, if the benefits are shown to be substantial, this will be - reconsidered in the future. - * We can do better than generic compression because we are aware of the AST structure - and other details: - * For example, macro compression that [deduplicates AST trees](https://github.com/WebAssembly/spec/issues/58#issuecomment-101863032) - can focus on AST nodes + their children, thus having `O(nodes)` entities to worry about, compared - to generic compression which in principle would need to look at `O(bytes*bytes)` entities. - Such macros would allow the logical equivalent of `#define ADD1(x) (x+1)`, i.e., - to be parametrized. Simpler macros (`#define ADDX1 (x+1)`) can implement useful +This document describes the binary encoding of the +[Abstract Syntax Tree](AstSemantics.md) nodes. + +The binary encoding is designed to allow fast startup, which includes reducing +download size and allow for quick decoding. + +Reducing download size, is achieved through three layers: + + * The **raw** binary encoding itself, natively decoded by the browser, and to + be standardized in the [MVP](MVP.md). + * **Specific** compression to the binary encoding, that is unreasonable to + expect a generic compression algorithm like gzip to achieve. + * This is not meant to be standardized, at least not initially, as it can be + done with a downloaded decompressor that runs as web content on the client, + and in particular can be implemented in a [polyfill](Polyfill.md). + * We can do better than generic compression because we are aware of the AST + structure and other details: + * For example, macro compression that + [deduplicates AST trees](https://github.com/WebAssembly/spec/issues/58#issuecomment-101863032) + can focus on AST nodes + their children, thus having `O(nodes)` entities + to worry about, compared to generic compression which in principle would + need to look at `O(bytes*bytes)` entities. Such macros would allow the + logical equivalent of `#define ADD1(x) (x+1)`, i.e., to be + parametrized. Simpler macros (`#define ADDX1 (x+1)`) can implement useful features like constant pools. - * Another example is reordering of functions and some internal nodes, which we know - does not change semantics, but + * Another example is reordering of functions and some internal nodes, which + we know does not change semantics, but [can improve general compression](http://www.rfk.id.au/blog/entry/cromulate-improve-compressibility/). - * **Generic** compression, such as gzip, already supported in browsers, or LZMA and other - compression algorithms, which might be standardized as well. + * **Generic** compression, such as gzip, already supported in browsers, or LZMA + and other compression algorithms, which might be standardized as well. + +Each of the three layers work to find compression opportunities to the best of +its abilities, without encroaching upon the subsequent layer's compression +opportunities. + +## Why a binary encoding instead of a text-only representation -Each of the three layers work to find compression opportunities to the best of its abilities, without encroaching upon the subsequent layer's compression opportunities. +Given that text is so compressible and it is well known that it is hard to beat +gzipped source, is there any win from having a binary format over a text format? +Yes: +* Large reductions in payload size can still significantly decrease the + compressed file size. + * Experimental results from a + [polyfill prototype](https://github.com/WebAssembly/polyfill) show the + gzipped binary format to be about 20-30% smaller than the corresponding + gzipped asm.js. +* A binary format that represents the names of variables and functions with raw + indices instead of strings is much faster to decode: array indexing + vs. dictionary lookup. + * Experimental results from a + [polyfill prototype](https://github.com/WebAssembly/polyfill) show that + decoding the binary format is about 23× faster than parsing the + corresponding asm.js source (using + [this demo](https://github.com/lukewagner/AngryBotsPacked), comparing + *just* parsing in SpiderMonkey (no validation, IR generation) to *just* + decoding in the polyfill (no asm.js code generation). ## Building blocks -- cgit v1.2.3 From 8ab1c1eaaf25e0688d893a70708aba31188f76d1 Mon Sep 17 00:00:00 2001 From: JF Bastien Date: Mon, 8 Jun 2015 17:46:01 +0200 Subject: Split out text format into its own document. --- BinaryEncoding.md | 2 +- MVP.md | 21 +++------------------ TextFormat.md | 22 ++++++++++++++++++++++ 3 files changed, 26 insertions(+), 19 deletions(-) create mode 100644 TextFormat.md (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index ad0c80c..bb9d339 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -125,7 +125,7 @@ conflict-avoidance practices surrounding string names: that modules could optionally use as a starting point to further refine. * For example, to use all of [the MVP](MVP.md) plus [SIMD](EssentialPostMVPFeatures.md#fixed-width-simd) the declaration could be "base" followed by the list of SIMD opcodes used. - * This feature would also be most useful for people handwriting the [text format](MVP.md#text-format). + * This feature would also be most useful for people handwriting the [text format](TextFormat.md). * However, such a version declaration does not establish a global "version" for the module or affect anything outside of the initialization of the index spaces; decoders would remain versionless and simply add cases for new *names* (as with current JS parsers). diff --git a/MVP.md b/MVP.md index ec17703..e9971a0 100644 --- a/MVP.md +++ b/MVP.md @@ -74,24 +74,9 @@ separate docs with more precise descriptions of: both straightforward and causes no loss of information in either direction. ## Text format -* The purpose of this format is to support: - * View Source on a WebAssembly module, thus fitting into the Web (where every source can - be viewed) in a natural way. - * Presentation in browser devtools when source maps aren't present (which is necessarily the case with the MVP). - * Writing WebAssembly code directly for reasons including pedagogical, experimental, debugging, or - optimization. -* The text format is equivalent and isomorphic to the - [binary format](MVP.md#binary-format), see notes there. -* The text format will be standardized, but only for tooling purposes: - * Compilers will support this format for `.S` and inline assembly. - * Debuggers and profilers will present binary code using this textual format. - * Browsers will not parse the textual format on regular web content in order - to implement WebAssembly semantics. -* Given that the code representation is actually an AST, the syntax would contain nested - statements and expressions (instead of the linear list of instructions most assembly languages have). -* There is no requirement to use JS syntax; this format is not intended to be evaluated or translated - directly into JS. -* TODO: there is no real proposal yet + +The [text format](TextFormat.md) provides readability to developers, and is +isomorphic to the [binary format](BinaryEncoding.md). ## Heap * In the MVP, when a WebAssembly module is loaded, it creates a new heap. diff --git a/TextFormat.md b/TextFormat.md new file mode 100644 index 0000000..1ce2bb4 --- /dev/null +++ b/TextFormat.md @@ -0,0 +1,22 @@ +# Text Format + +The purpose of this text format is to support: +* View Source on a WebAssembly module, thus fitting into the Web (where every + source can be viewed) in a natural way. +* Presentation in browser development tools when source maps aren't present + (which is necessarily the case with the MVP). +* Writing WebAssembly code directly for reasons including pedagogical, + experimental, debugging, optimization, and testing of the spec itself. +* The text format is equivalent and isomorphic to the + [binary format](BinaryEncoding.md). +* The text format will be standardized, but only for tooling purposes: + * Compilers will support this format for `.S` and inline assembly. + * Debuggers and profilers will present binary code using this textual format. + * Browsers will not parse the textual format on regular web content in order + to implement WebAssembly semantics. +* Given that the code representation is actually an + [Abstract Syntax Tree](ASTSemantics.md), the syntax would contain nested + statements and expressions (instead of the linear list of instructions most + assembly languages have). +* There is no requirement to use JavaScript syntax; this format is not intended + to be evaluated or translated directly into JavaScript. -- cgit v1.2.3