aboutsummaryrefslogtreecommitdiff
path: root/BinaryEncoding.md
diff options
context:
space:
mode:
authorJF Bastien <jfb@chromium.org>2015-06-09 23:27:25 +0200
committerJF Bastien <jfb@chromium.org>2015-06-09 23:27:25 +0200
commitd1b0bf71f8f1942e2093f21578097dc9cba34266 (patch)
tree24c2a52820d7c9102ddcc2414219fb26a1b8edd9 /BinaryEncoding.md
parentd0e928ac6d57f75a139f99f6cbd73024661e76e3 (diff)
parente1a343c9a7a7f92bdda1b6306aabcfa7eddb1f30 (diff)
downloadnanowasm-design-d1b0bf71f8f1942e2093f21578097dc9cba34266.tar.gz
Merge.
Diffstat (limited to 'BinaryEncoding.md')
-rw-r--r--BinaryEncoding.md84
1 files changed, 53 insertions, 31 deletions
diff --git a/BinaryEncoding.md b/BinaryEncoding.md
index 589c1d3..d2f8fca 100644
--- a/BinaryEncoding.md
+++ b/BinaryEncoding.md
@@ -1,39 +1,61 @@
# Binary Encoding
This document describes the [portable](Portability.md) binary encoding of the
-AST nodes defined in the [AST semantics](AstSemantics.md). For the context of
-this document, see the [Binary format](MVP.md#binary-format) section of the
-[MVP doc](MVP.md).
-
-The binary encoding is designed to allow fast startup, which includes reducing download size
-and allow for quick decoding. Considering the matter of reducing download size, we can see
-it as having three layers:
-
- * The **raw** binary encoding itself, natively decoded by the browser, and to be standardized in
- the MVP of the spec.
- * **Specific** compression to the binary encoding, that is unreasonable to expect a generic
- compression algorithm like gzip to achieve.
- * This is not meant to be standardized, at least not initially, as it can be done with a
- downloaded decompressor that runs as web content on the client, and in particular
- can be implemented in a polyfill. Not standardizing it leaves the binary
- encoding as the only thing a WebAssembly implementation is required to implement,
- which is nice. However, if the benefits are shown to be substantial, this will be
- reconsidered in the future.
- * We can do better than generic compression because we are aware of the AST structure
- and other details:
- * For example, macro compression that [deduplicates AST trees](https://github.com/WebAssembly/spec/issues/58#issuecomment-101863032)
- can focus on AST nodes + their children, thus having `O(nodes)` entities to worry about, compared
- to generic compression which in principle would need to look at `O(bytes*bytes)` entities.
- Such macros would allow the logical equivalent of `#define ADD1(x) (x+1)`, i.e.,
- to be parametrized. Simpler macros (`#define ADDX1 (x+1)`) can implement useful
+[Abstract Syntax Tree](AstSemantics.md) nodes.
+
+The binary encoding is designed to allow fast startup, which includes reducing
+download size and allow for quick decoding.
+
+Reducing download size, is achieved through three layers:
+
+ * The **raw** binary encoding itself, natively decoded by the browser, and to
+ be standardized in the [MVP](MVP.md).
+ * **Specific** compression to the binary encoding, that is unreasonable to
+ expect a generic compression algorithm like gzip to achieve.
+ * This is not meant to be standardized, at least not initially, as it can be
+ done with a downloaded decompressor that runs as web content on the client,
+ and in particular can be implemented in a [polyfill](Polyfill.md).
+ * We can do better than generic compression because we are aware of the AST
+ structure and other details:
+ * For example, macro compression that
+ [deduplicates AST trees](https://github.com/WebAssembly/spec/issues/58#issuecomment-101863032)
+ can focus on AST nodes + their children, thus having `O(nodes)` entities
+ to worry about, compared to generic compression which in principle would
+ need to look at `O(bytes*bytes)` entities. Such macros would allow the
+ logical equivalent of `#define ADD1(x) (x+1)`, i.e., to be
+ parametrized. Simpler macros (`#define ADDX1 (x+1)`) can implement useful
features like constant pools.
- * Another example is reordering of functions and some internal nodes, which we know
- does not change semantics, but
+ * Another example is reordering of functions and some internal nodes, which
+ we know does not change semantics, but
[can improve general compression](http://www.rfk.id.au/blog/entry/cromulate-improve-compressibility/).
- * **Generic** compression, such as gzip, already supported in browsers, or LZMA and other
- compression algorithms, which might be standardized as well.
+ * **Generic** compression, such as gzip, already supported in browsers, or LZMA
+ and other compression algorithms, which might be standardized as well.
+
+Each of the three layers work to find compression opportunities to the best of
+its abilities, without encroaching upon the subsequent layer's compression
+opportunities.
+
+## Why a binary encoding instead of a text-only representation
-Each of the three layers work to find compression opportunities to the best of its abilities, without encroaching upon the subsequent layer's compression opportunities.
+Given that text is so compressible and it is well known that it is hard to beat
+gzipped source, is there any win from having a binary format over a text format?
+Yes:
+* Large reductions in payload size can still significantly decrease the
+ compressed file size.
+ * Experimental results from a
+ [polyfill prototype](https://github.com/WebAssembly/polyfill) show the
+ gzipped binary format to be about 20-30% smaller than the corresponding
+ gzipped asm.js.
+* A binary format that represents the names of variables and functions with raw
+ indices instead of strings is much faster to decode: array indexing
+ vs. dictionary lookup.
+ * Experimental results from a
+ [polyfill prototype](https://github.com/WebAssembly/polyfill) show that
+ decoding the binary format is about 23× faster than parsing the
+ corresponding asm.js source (using
+ [this demo](https://github.com/lukewagner/AngryBotsPacked), comparing
+ *just* parsing in SpiderMonkey (no validation, IR generation) to *just*
+ decoding in the polyfill (no asm.js code generation).
## Building blocks
@@ -103,7 +125,7 @@ conflict-avoidance practices surrounding string names:
that modules could optionally use as a starting point to further refine.
* For example, to use all of [the MVP](MVP.md) plus [SIMD](EssentialPostMVPFeatures.md#fixed-width-simd)
the declaration could be "base" followed by the list of SIMD opcodes used.
- * This feature would also be most useful for people handwriting the [text format](MVP.md#text-format).
+ * This feature would also be most useful for people handwriting the [text format](TextFormat.md).
* However, such a version declaration does not establish a global "version" for the module
or affect anything outside of the initialization of the index spaces; decoders would
remain versionless and simply add cases for new *names* (as with current JS parsers).