Binary format.

author: JF Bastien <jfb@chromium.org> 2015-06-08 17:38:49 +0200
committer: JF Bastien <jfb@chromium.org> 2015-06-08 17:38:49 +0200
commit: 9329862be20ddb6da38e28519871910c2680af1d (patch)
tree: 816e0a891316271fdb0b82531abeae649f559d55 /BinaryEncoding.md
parent: 0b269f854d28e189cff9057a6d2441325b678ecd (diff)
download: nanowasm-design-9329862be20ddb6da38e28519871910c2680af1d.tar.gz
1 files changed, 53 insertions, 29 deletions
diff --git a/BinaryEncoding.md b/BinaryEncoding.md
index e605ea5..ad0c80c 100644
--- a/BinaryEncoding.md
+++ b/BinaryEncoding.md
@@ -1,37 +1,61 @@
 # Binary Encoding
 
-This document describes the binary encoding of the AST nodes defined in the [AST semantics](AstSemantics.md). 
-For the context of this document, see the [Binary format](MVP.md#binary-format) section of the [MVP doc](MVP.md).
-
-The binary encoding is designed to allow fast startup, which includes reducing download size
-and allow for quick decoding. Considering the matter of reducing download size, we can see
-it as having three layers:
-
- * The **raw** binary encoding itself, natively decoded by the browser, and to be standardized in
-   the MVP of the spec.
- * **Specific** compression to the binary encoding, that is unreasonable to expect a generic
-   compression algorithm like gzip to achieve.
-   * This is not meant to be standardized, at least not initially, as it can be done with a
-     downloaded decompressor that runs as web content on the client, and in particular
-     can be implemented in a polyfill. Not standardizing it leaves the binary
-     encoding as the only thing a WebAssembly implementation is required to implement,
-     which is nice. However, if the benefits are shown to be substantial, this will be
-     reconsidered in the future.
-   * We can do better than generic compression because we are aware of the AST structure
-     and other details:
-     * For example, macro compression that [deduplicates AST trees](https://github.com/WebAssembly/spec/issues/58#issuecomment-101863032)
-       can focus on AST nodes + their children, thus having `O(nodes)` entities to worry about, compared
-       to generic compression which in principle would need to look at `O(bytes*bytes)` entities.
-       Such macros would allow the logical equivalent of `#define ADD1(x) (x+1)`, i.e.,
-       to be parametrized. Simpler macros (`#define ADDX1 (x+1)`) can implement useful
+This document describes the binary encoding of the
+[Abstract Syntax Tree](AstSemantics.md) nodes.
+
+The binary encoding is designed to allow fast startup, which includes reducing
+download size and allow for quick decoding.
+
+Reducing download size, is achieved through three layers:
+
+ * The **raw** binary encoding itself, natively decoded by the browser, and to
+   be standardized in the [MVP](MVP.md).
+ * **Specific** compression to the binary encoding, that is unreasonable to
+   expect a generic compression algorithm like gzip to achieve.
+   * This is not meant to be standardized, at least not initially, as it can be
+     done with a downloaded decompressor that runs as web content on the client,
+     and in particular can be implemented in a [polyfill](Polyfill.md).
+   * We can do better than generic compression because we are aware of the AST
+     structure and other details:
+     * For example, macro compression that
+       [deduplicates AST trees](https://github.com/WebAssembly/spec/issues/58#issuecomment-101863032)
+       can focus on AST nodes + their children, thus having `O(nodes)` entities
+       to worry about, compared to generic compression which in principle would
+       need to look at `O(bytes*bytes)` entities.  Such macros would allow the
+       logical equivalent of `#define ADD1(x) (x+1)`, i.e., to be
+       parametrized. Simpler macros (`#define ADDX1 (x+1)`) can implement useful
        features like constant pools.
-     * Another example is reordering of functions and some internal nodes, which we know
-       does not change semantics, but
+     * Another example is reordering of functions and some internal nodes, which
+       we know does not change semantics, but
        [can improve general compression](http://www.rfk.id.au/blog/entry/cromulate-improve-compressibility/).
- * **Generic** compression, such as gzip, already supported in browsers, or LZMA and other
-   compression algorithms, which might be standardized as well.
+ * **Generic** compression, such as gzip, already supported in browsers, or LZMA
+   and other compression algorithms, which might be standardized as well.
+
+Each of the three layers work to find compression opportunities to the best of
+its abilities, without encroaching upon the subsequent layer's compression
+opportunities.
+
+## Why a binary encoding instead of a text-only representation
 
-Each of the three layers work to find compression opportunities to the best of its abilities, without encroaching upon the subsequent layer's compression opportunities.
+Given that text is so compressible and it is well known that it is hard to beat
+gzipped source, is there any win from having a binary format over a text format?
+Yes:
+* Large reductions in payload size can still significantly decrease the
+  compressed file size.
+  * Experimental results from a
+    [polyfill prototype](https://github.com/WebAssembly/polyfill) show the
+    gzipped binary format to be about 20-30% smaller than the corresponding
+    gzipped asm.js.
+* A binary format that represents the names of variables and functions with raw
+  indices instead of strings is much faster to decode: array indexing
+  vs. dictionary lookup.
+   * Experimental results from a
+     [polyfill prototype](https://github.com/WebAssembly/polyfill) show that
+     decoding the binary format is about 23× faster than parsing the
+     corresponding asm.js source (using
+     [this demo](https://github.com/lukewagner/AngryBotsPacked), comparing
+     *just* parsing in SpiderMonkey (no validation, IR generation) to *just*
+     decoding in the polyfill (no asm.js code generation).
 
 ## Building blocks
author	JF Bastien <jfb@chromium.org>	2015-06-08 17:38:49 +0200
committer	JF Bastien <jfb@chromium.org>	2015-06-08 17:38:49 +0200
commit	9329862be20ddb6da38e28519871910c2680af1d (patch)
tree	816e0a891316271fdb0b82531abeae649f559d55 /BinaryEncoding.md
parent	0b269f854d28e189cff9057a6d2441325b678ecd (diff)
download	nanowasm-design-9329862be20ddb6da38e28519871910c2680af1d.tar.gz