Start working on BinaryEncoding.md

Just throwing up an outline, still a WIP.
author: Luke Wagner <mail@lukewagner.name> 2015-04-30 21:41:30 -0500
committer: Luke Wagner <mail@lukewagner.name> 2015-05-01 16:13:26 -0500
commit: 05436f1359a3d16559ee4032c9fb5bc98ab7ded4 (patch)
tree: 4691aeb825e936f01b840c8b2a4a98837b5e4604 /BinaryEncoding.md
parent: c2ba24ee166134595b05c50188b33de82bcba645 (diff)
download: nanowasm-design-05436f1359a3d16559ee4032c9fb5bc98ab7ded4.tar.gz
1 files changed, 71 insertions, 0 deletions
diff --git a/BinaryEncoding.md b/BinaryEncoding.md
new file mode 100644
index 0000000..bde1d5d
--- /dev/null
+++ b/BinaryEncoding.md
@@ -0,0 +1,71 @@
+# Binary Encoding
+
+This document describes the binary encoding of the AST nodes defined in the [AST semantics](AstSemantics.md). 
+For the context of this document, see the [Binary format](V1.md#binary-format) section of the [v.1 doc](V1.md).
+
+*The current [polyfill](https://github.com/WebAssembly/polyfill) is a prototype and thus incomplete and not 
+representative of an actual WebAssembly binary format. However, there are a few key design choices in the 
+prototype format that significantly and demonstrably impact size and decode speed that are described below 
+as a starting point for defining a real standard binary format.*
+
+## Building blocks
+
+### Variable-length integers
+
+* LEB, except only for a uint32_t: explain
+
+### Serialized opcode
+
+* Currently, a single byte
+* Reserve a bit to allow variable-length opcodes?  No, just reserve a single 8-bit value: explain.
+
+### Constant pool
+
+* Table of variable length integers, fixed-width floats/doubles.
+* Still want immediate-literals because many single-use constants.
+* Use heuristic to pick.
+
+## Global structure
+
+* A module contains:
+  * a header followed by
+  * a table containing, for each section, its type, offset (within the module), and byte length, followed by
+  * a sequence of sections.
+* A section contains:
+  * a header followed by
+  * the section contents (specific to the section type)
+* A code section contains:
+  * the generic section header
+  * a table containing, for each function, it's signature, offset (within the seciton), sorted by offset, followed by
+  * a sequence of functions
+* A function contains:
+  * a table containing, for each type, how many locals are indexed by the function body of that type
+  * the serialized AST
+
+## Serialized AST
+
+* Preorder
+  * Good for efficient single-pass validation+compilation: explain.
+  * Good for polyfill: explain.
+  * Allows type-segregated index spaces
+  * Basic scheme: parent opcode, immediates (variable length), children
+  * Variations: variable arity, ...
+* Type-segregated index spaces
+  * Take advantage of type inherent in parent nodes.
+  * Keep index spaces small.
+  * Good for immediate folding: need tiny index space to make room for immediate.
+* Immediate folding.
+  * 30-40% of bytes are get/setloc: explain why
+  * Massive size reduction by folding immediate into these ops
+  * Another 10% are literals, so another significant reduction
+
+## Example of serialized AST
+
+Show several example ASTs as SExprs, then serialization.
+
+## Macro layer
+
+Idea (not in prototype): can we have a macro layer on top of raw serialized AST?
+The idea is to take advantage of program semantics to reduce redundancy that a
+generic compression algorithm would miss because of differences in names (which a macro
+can be parameterized over).
author	Luke Wagner <mail@lukewagner.name>	2015-04-30 21:41:30 -0500
committer	Luke Wagner <mail@lukewagner.name>	2015-05-01 16:13:26 -0500
commit	05436f1359a3d16559ee4032c9fb5bc98ab7ded4 (patch)
tree	4691aeb825e936f01b840c8b2a4a98837b5e4604 /BinaryEncoding.md
parent	c2ba24ee166134595b05c50188b33de82bcba645 (diff)
download	nanowasm-design-05436f1359a3d16559ee4032c9fb5bc98ab7ded4.tar.gz