aboutsummaryrefslogtreecommitdiff
path: root/BinaryEncoding.md
diff options
context:
space:
mode:
authortitzer <titzer@google.com>2016-02-08 18:19:30 +0100
committertitzer <titzer@google.com>2016-02-08 18:19:30 +0100
commit6490fe86194110c8b191b2b0c1a651dad624485c (patch)
tree4d092aa31d3250c38e622a5df644906a7b87b404 /BinaryEncoding.md
parentadf755b3c3d86c14bd2e71969b07cc1d5a38d4e9 (diff)
downloadnanowasm-design-6490fe86194110c8b191b2b0c1a651dad624485c.tar.gz
Add first AST section.
Diffstat (limited to 'BinaryEncoding.md')
-rw-r--r--BinaryEncoding.md33
1 files changed, 22 insertions, 11 deletions
diff --git a/BinaryEncoding.md b/BinaryEncoding.md
index ae51fd3..6a3aec1 100644
--- a/BinaryEncoding.md
+++ b/BinaryEncoding.md
@@ -1,7 +1,6 @@
# Binary Encoding
-This document describes the [portable](Portability.md) binary encoding of the
-[Abstract Syntax Tree](AstSemantics.md) nodes.
+This document describes the [portable](Portability.md) binary encoding of the WebAssembly modules.
The binary encoding is a dense representation of module information that enables
small files, fast decoding, and reduced memory usage.
@@ -40,24 +39,23 @@ A two-byte little endian unsigned integer.
A four-byte little endian unsigned integer.
### varuint32
-A [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer, limited to uint32 payloads. Provides considerable size reduction.
+A [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer, limited to uint32 values.
# Definitions
### Pre-order encoding
-Refers to an approach for encoding syntax trees, where each node begins with an identifier
-followed by any arguments or child nodes. Pre-order trees can be decoded iteratively or
-recursively. Alternative approaches include post-order trees and table representations.
+Refers to an approach for encoding syntax trees, where each node begins with an identifying binary
+sequence, then followed recursively by any child nodes.
* Examples
* Given a simple AST node: `struct I32Add { AstNode *left, *right; }`
- * First write the opcode of `I32Add` (uint8)
+ * First write the binary sequence for `I32Add` (uint8)
* Then recursively write the left and right nodes.
- * Given a call AST node: `struct Call { uint32_t callee; vector<AstNode*> args; }`
- * First write the opcode of `Call` (uint8)
- * Then write the (variable-length) integer `Call::callee` (varuint32)
- * Then recursively write each argument node, where arity is determined by looking up `callee` in a table of signatures
+ * Given a call AST node: `struct Call { uint32_t callee_index; vector<AstNode*> args; }`
+ * First write the binary sequence of `Call` (uint8)
+ * Then write the (variable-length) integer `Call::callee_index` (varuint32)
+ * Then recursively write each argument node, where arity is determined by looking up `callee_index` in a table of signatures
### Strings
Strings referenced by the module (i.e. function names) are encoded as null-terminated [UTF8](http://unicode.org/faq/utf_bom.html#UTF8).
@@ -188,3 +186,16 @@ stored to from dedicated instructions.
| id = `0x11` | `uint8` | section identifier for globals |
| size | `varuint32` | size of this section in bytes |
| body | `bytes` | contents of this section |
+
+# AST Encoding
+
+Function bodies consist of a dense pre-order encoding of an [Abstract Syntax Tree](AstSemantics.md).
+Each node in the abstract syntax tree corresponds to an operator, such as `i32.add` or `if` or `block`.
+Most operators encode as a single `uint8`, but some require *immediates* which immediately follow the
+first byte, before the child nodes.
+
+## Control flow operators
+
+| Name | Opcode | Immediate | Description |
+| ---- | ---- | ---- | ---- |
+| `nop` | `0x00` | | no operation |