aboutsummaryrefslogtreecommitdiff
path: root/BinaryEncoding.md
diff options
context:
space:
mode:
authorLuke Wagner <mail@lukewagner.name>2015-04-30 21:41:30 -0500
committerLuke Wagner <mail@lukewagner.name>2015-05-01 16:13:26 -0500
commit05436f1359a3d16559ee4032c9fb5bc98ab7ded4 (patch)
tree4691aeb825e936f01b840c8b2a4a98837b5e4604 /BinaryEncoding.md
parentc2ba24ee166134595b05c50188b33de82bcba645 (diff)
downloadnanowasm-design-05436f1359a3d16559ee4032c9fb5bc98ab7ded4.tar.gz
Start working on BinaryEncoding.md
Just throwing up an outline, still a WIP.
Diffstat (limited to 'BinaryEncoding.md')
-rw-r--r--BinaryEncoding.md71
1 files changed, 71 insertions, 0 deletions
diff --git a/BinaryEncoding.md b/BinaryEncoding.md
new file mode 100644
index 0000000..bde1d5d
--- /dev/null
+++ b/BinaryEncoding.md
@@ -0,0 +1,71 @@
+# Binary Encoding
+
+This document describes the binary encoding of the AST nodes defined in the [AST semantics](AstSemantics.md).
+For the context of this document, see the [Binary format](V1.md#binary-format) section of the [v.1 doc](V1.md).
+
+*The current [polyfill](https://github.com/WebAssembly/polyfill) is a prototype and thus incomplete and not
+representative of an actual WebAssembly binary format. However, there are a few key design choices in the
+prototype format that significantly and demonstrably impact size and decode speed that are described below
+as a starting point for defining a real standard binary format.*
+
+## Building blocks
+
+### Variable-length integers
+
+* LEB, except only for a uint32_t: explain
+
+### Serialized opcode
+
+* Currently, a single byte
+* Reserve a bit to allow variable-length opcodes? No, just reserve a single 8-bit value: explain.
+
+### Constant pool
+
+* Table of variable length integers, fixed-width floats/doubles.
+* Still want immediate-literals because many single-use constants.
+* Use heuristic to pick.
+
+## Global structure
+
+* A module contains:
+ * a header followed by
+ * a table containing, for each section, its type, offset (within the module), and byte length, followed by
+ * a sequence of sections.
+* A section contains:
+ * a header followed by
+ * the section contents (specific to the section type)
+* A code section contains:
+ * the generic section header
+ * a table containing, for each function, it's signature, offset (within the seciton), sorted by offset, followed by
+ * a sequence of functions
+* A function contains:
+ * a table containing, for each type, how many locals are indexed by the function body of that type
+ * the serialized AST
+
+## Serialized AST
+
+* Preorder
+ * Good for efficient single-pass validation+compilation: explain.
+ * Good for polyfill: explain.
+ * Allows type-segregated index spaces
+ * Basic scheme: parent opcode, immediates (variable length), children
+ * Variations: variable arity, ...
+* Type-segregated index spaces
+ * Take advantage of type inherent in parent nodes.
+ * Keep index spaces small.
+ * Good for immediate folding: need tiny index space to make room for immediate.
+* Immediate folding.
+ * 30-40% of bytes are get/setloc: explain why
+ * Massive size reduction by folding immediate into these ops
+ * Another 10% are literals, so another significant reduction
+
+## Example of serialized AST
+
+Show several example ASTs as SExprs, then serialization.
+
+## Macro layer
+
+Idea (not in prototype): can we have a macro layer on top of raw serialized AST?
+The idea is to take advantage of program semantics to reduce redundancy that a
+generic compression algorithm would miss because of differences in names (which a macro
+can be parameterized over).