From 5ea72fdb7fdff32ccfc47c6bb05ecfd0683e440f Mon Sep 17 00:00:00 2001 From: titzer Date: Mon, 8 Feb 2016 16:57:32 +0100 Subject: Add tables describing the encoding of different sections. --- BinaryEncoding.md | 286 +++++++++++++++++++++++++++++++++--------------------- 1 file changed, 173 insertions(+), 113 deletions(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 6aa2ff2..77df401 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -3,120 +3,180 @@ This document describes the [portable](Portability.md) binary encoding of the [Abstract Syntax Tree](AstSemantics.md) nodes. -The binary encoding is designed to allow fast startup, which includes reducing -download size and allow for quick decoding. For more information, see the -[rationale document](Rationale.md#why-a-binary-encoding) - -Reducing download size, is achieved through three layers: - - * The **raw** binary encoding itself, natively decoded by the browser, and to - be standardized in the [MVP](MVP.md). - * **Specific** compression to the binary encoding, that is unreasonable to - expect a generic compression algorithm like gzip to achieve. - * This is not meant to be standardized, at least not initially, as it can be - done with a downloaded decompressor that runs as web content on the client, - and in particular can be implemented in a [polyfill](Polyfill.md). - * **Generic** compression, such as gzip, already supported in browsers. Other - compression algorithms being considered and which might be standardized - include: LZMA, [LZHAM](https://github.com/richgel999/lzham_codec), - [Brotli](https://datatracker.ietf.org/doc/draft-alakuijala-brotli/). - -## Variable-length integers - * [Polyfill prototype](https://github.com/WebAssembly/polyfill-prototype-1) shows significant size savings before (31%) and after (7%) compression. - * [LEB128](https://en.wikipedia.org/wiki/LEB128) except limited to uint32_t payloads. - -## Global structure - -* A module contains (in this order): - - A header, containing: - + The [magic number](https://en.wikipedia.org/wiki/Magic_number_%28programming%29) - + Other data TBD - - A table (sorted by offset) containing, for each section: - + A string literal section type name - + 64-bit offset within the module - - A sequence of sections -* A section contains: - - A header followed by - - The section contents (specific to the section type) -* A `definitions` section contains (in this order): - - The generic section header - - A table (sorted by offset) containing, for each type which has operators: - + A standardized string literal [type name](AstSemantics.md#expression-types). - The index of a type name in this table is referred to as a type ID - + 64-bit offset of its operator table within the section - - A sequence of operator tables - - An operator table contains: - + A sequence of standardized string literal [operator names](AstSemantics.md), - where order determines operator index -* A `code` section contains (in this order): - - The generic section header - - A table (sorted by offset) containing, for each function: - + Signature - + Function attributes, valid attributes TBD (could include hot/cold, optimization level, noreturn, read/write/pure, ...) - + 64-bit offset within the section - - A sequence of functions - - A function contains: - + A table containing, for each type ID that has [locals](AstSemantics.md#local-variables): - * Type ID - * Count of locals - + The serialized AST -* A `data` section contains (in this order): - - The generic section header - - A sequence of byte ranges within the binary and corresponding addresses in the linear memory - - -All strings are encoded as null-terminated UTF8. Data segments represent -initialized data that is loaded directly from the binary into the linear memory -when the program starts (see [modules](Modules.md#linear-memory-section)). - -## Serialized AST - -* Use a preorder encoding of the AST - * Efficient single-pass validation+compilation and polyfill -* The data of a node (if there is any), is written immediately after the operator and before child nodes - * The operator statically determines what follows, so no generic metadata is necessary. +The binary encoding is a dense representation of module information that enables +small files, fast decoding, and reduced memory usage. +See the [rationale document](Rationale.md#why-a-binary-encoding) for more detail. + +The encoding is split into three layers: + +* **Layer 0** is a simple pre-order encoding of the AST and related data structures. + The encoding is dense and trivial to interact with, making it suitable for + scenarios like JIT, instrumentation tools, and debugging. +* **Layer 1** provides structural compression on top of layer 0, exploiting + specific knowledge about the nature of the syntax tree and its nodes. + The structural compression introduces more efficient encoding of values, + rearranges values within the module, and prunes structurally identical + tree nodes. +* **Layer 2** applies generic compression techniques, already available + in browsers and other tooling. Algorithms as simple as gzip can deliver + good results, but more sophisticated algorithms like + [LZHAM](https://github.com/richgel999/lzham_codec) and + [Brotli](https://datatracker.ietf.org/doc/draft-alakuijala-brotli/) are able + to deliver dramatically smaller files. + +Most importantly, the layering approach allows development and standardization to +occur incrementally, even though production-quality implementations will need to +implement all of the layers. + +# Primitives and key terminology + +### uint8 +A single-byte unsigned integer. + +### uint16 +A two-byte little endian unsigned integer. + +### uint32 +A four-byte little endian unsigned integer. + +### varuint32 +A [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer, limited to uint32 payloads. Provides considerable size reduction. + +### Pre-order encoding +Refers to an approach for encoding syntax trees, where each node begins with an identifier +followed by any arguments or child nodes. Pre-order trees can be decoded iteratively or +recursively. Alternative approaches include post-order trees and table representations. + * Examples * Given a simple AST node: `struct I32Add { AstNode *left, *right; }` - * First write the operator of `I32Add` (1 byte) + * First write the opcode of `I32Add` (uint8) * Then recursively write the left and right nodes. + * Given a call AST node: `struct Call { uint32_t callee; vector args; }` - * First write the operator of `Call` (1 byte) - * Then write the (variable-length) integer `Call::callee` (1-5 bytes) - * Then recursively write each arg node (arity is determined by looking up `callee` in table of signatures) - -## Backwards Compatibility - -As explained above, for size- and decode-efficiency, the binary format will serialize AST nodes, -their contents and children using dense integer indices and without any kind of embedded metadata -or tagging. This raises the question of how to reconcile the efficient encoding with the -backwards-compatibility goals. - -Specifically, we'd like to avoid the situation where a future version of WebAssembly has features -F1 and F2 and vendor V1 implements F1, assigning the next logical operator indices to F1's new -operators, and V2 implements F2, assigning the same next logical operator indices to F2's new operators -and now a single binary has ambiguous semantics if it tries to use either F1 or F2. This type of -non-linear feature addition is commonplace in JavaScript and Web APIs and is guarded against by -having unique names for unique features (and associated [conventions](https://hsivonen.fi/vendor-prefixes/)). - -The current proposal is to maintain both the efficiency of indices in the [serialized AST](BinaryEncoding.md#serialized-ast) and the established -conflict-avoidance practices surrounding string names: - * The WebAssembly spec doesn't define any global index spaces - * So, as a general rule, no magic numbers in the spec (other than the literal [magic number](https://en.wikipedia.org/wiki/Magic_number_%28programming%29)). - * Instead, a module defines its *own* local index spaces of operators by providing tables *of names*. - * So what the spec *would* define is a set of names and their associated semantics. - * To avoid (over time) large index-space declaration sections that are largely the same - between modules, finalized versions of standards would define named baseline index spaces - that modules could optionally use as a starting point to further refine. - * For example, to use all of [the MVP](MVP.md) plus - [SIMD](PostMVP.md#fixed-width-simd) the declaration could be "base" - followed by the list of SIMD operators used. - * This feature would also be most useful for people handwriting the [text format](TextFormat.md). - * However, such a version declaration does not establish a global "version" for the module - or affect anything outside of the initialization of the index spaces; decoders would - remain versionless and simply add cases for new *names* (as with current JavaScript parsers). - -## Proposals - -The native prototype built for [V8](https://github.com/WebAssembly/v8-native-prototype) -implements a binary format that embodies most, but not all of the ideas in this document. -It is described in detail in a [public design doc](https://docs.google.com/a/google.com/document/d/1761v1AfhFM5kE8NArF_PyXcl-iVh0Dx3InOrmcyIoiI/pub) and a [copy of the original](https://docs.google.com/document/d/1-G11CnMA0My20KI9D7dBR6ZCPOBCRD0oCH6SHCPFGx0/edit?usp=sharing). + * First write the opcode of `Call` (uint8) + * Then write the (variable-length) integer `Call::callee` (varuint32) + * Then recursively write each argument node, where arity is determined by looking up `callee` in a table of signatures + +### Strings +Strings referenced by the module (i.e. function names) are encoded as null-terminated [UTF8](http://unicode.org/faq/utf_bom.html#UTF8). + +# Module structure + +The following documents the current prototype format. This format is based on and supersedes the v8-native prototype format, originally in a [public design doc](https://docs.google.com/document/d/1-G11CnMA0My20KI9D7dBR6ZCPOBCRD0oCH6SHCPFGx0/edit?usp=sharing). + +## High-level structure +A module contains a sequence of sections, containing for each section: + - ```uint8```: A [section type identifier](https://github.com/v8/v8/blob/master/src/wasm/wasm-module.h#L26) for the section + - The section body, which is defined below by section type + +### Memory section +A module may contain at most one memory section. + +* ```uint8```: The minimum size of the module heap in bytes, as a power of two +* ```uint8```: The maximum size of the module heap in bytes, as a power of two +* ```uint8```: ```1``` if the module's memory is externally visible + +### Signatures section +A module may contain at most one signatures section. + +| Field | Type | Description | +| ----- | ----- | ----- | +| id = `0x01` | `uint8` | section identifier for signatures | +| count | `varuint32` | count of signature entries to follow | +| entries | `signature_entry*` | repeated signature entries as described below | + +* [```varuint32```](#varuint32): The number of function signatures in the section +* For each function signature: + - ```uint8```: The number of parameters + - ```uint8```: The function return type, as a [LocalType](https://github.com/v8/v8/blob/master/src/wasm/wasm-opcodes.h#L16) + - For each parameter: + + ```uint8```: The parameter type, as a LocalType + +Local types are encoded with these values: +* `0` indicates the lack of a return type +* `1` type `i32` +* `2` type `i64` +* `3` type `f32` +* `4` type `f64` + +### Functions section +The Functions section declares the functions in the module and must be preceded by a [Signatures](#signatures-section) section. A module may contain at most one functions section. It consists of a count of entries followed immediately by those entries. + +| Field | Type | Description | +| ----- | ----- | ----- | +| id = `0x02` | `uint8` | section identifier for functions | +| count | `varuint32` | count of function entries to follow | +| entries | `function_entry*` | repeated function entries as described below | + +#### Function Entry + +Each function entry describes a function that can be optionally named, imported and/or exported. Non-imported functions +must contain a function body. + +| Field | Type | Present? | Description | +| ----- | ----- | ----- | ----- | +| flags | `uint8` | always | flags indicating attributes of a function
bit `0` : a name is present
bit `1` : the function is an import
bit `2` : the function has local variables
bit `3` : the function is an export | +| signature | `uint16` | always | index into the Signature section | +| name | `uint32` | `flags[0] == 1` | name of the function as an offset within the module | +| i32 count | `uint16` | `flags[2] == 1` | number of i32 local variables | +| i64 count | `uint16` | `flags[2] == 1` | number of i64 local variables | +| f32 count | `uint16` | `flags[2] == 1` | number of f32 local variables | +| f64 count | `uint16` | `flags[2] == 1` | number of f64 local variables | +| body size | `uint16` | `flags[0] == 0` | size of function body to follow, in bytes | +| body | `bytes` | `flags[0] == 0` | function body | + +### Data Segments section +A module may only contain one data segments section. + +| Field | Type | Description | +| ----- | ----- | ----- | +| id = `0x04` | `uint8` | section identifier for data segments | +| count | `varuint32` | count of data segments to follow | +| entries | `data_segment*` | repeated data segments as described below | + +* ```varuint32```: The number of data segments in the section. +* For each data segment: + - ```uint32```: The base address of the data segment in memory. + - ```uint32```: The offset of the data segment's data in the file. + - ```uint32```: The size of the data segment (in bytes) + - ```uint8```: ```1``` if the segment's data should be automatically loaded into memory at module load time. + +### Indirect Function Table section +This section must be preceded by a [Functions](#functions-section) section. + +| Field | Type | Description | +| ----- | ----- | ----- | +| id = `0x05` | `uint8` | section identifier for indirect function table | +| count | `varuint32` | count of entries to follow | +| entries | `uint16*` | repeated indexes into the function table | + +### WLL section + +* ```varuint32```: The size of the section body, in bytes +* The section body (contents currently undefined) + +### End section +This indicates the end of the module's sections. Additional data can follow this section marker (for example, to store function names or data segment bodies) but it is not parsed by the decoder. + + +### Nonstandard: Globals section +A module may only contain one globals section. This section is currently for V8 internal use. + +| Field | Type | Description | +| ----- | ----- | ----- | +| id = `0x03` | `uint8` | section identifier for globals | +| count | `varuint32` | count of global variable entries to follow | +| entries | `global_variable*` | repeated global variable entries as described below | + +#### Global Variable entry + +A global variable entry describes a variable outside the WASM lineary memory. It can only be loaded and +stored to from dedicated instructions. + +| Field | Type | Description | +| ----- | ----- | ----- | +| name | `uint32` | name of the global variable as an offset to a string in the module | +| type | `uint8` | the type of the global, as a memory type | +| exported | `uint8` | a boolean indicating whether the global variable is exported | + + -- cgit v1.2.3 From adf755b3c3d86c14bd2e71969b07cc1d5a38d4e9 Mon Sep 17 00:00:00 2001 From: titzer Date: Mon, 8 Feb 2016 18:10:11 +0100 Subject: Finish adding sections --- BinaryEncoding.md | 64 +++++++++++++++++++++++++++++++------------------------ 1 file changed, 36 insertions(+), 28 deletions(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 77df401..ae51fd3 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -28,7 +28,7 @@ Most importantly, the layering approach allows development and standardization t occur incrementally, even though production-quality implementations will need to implement all of the layers. -# Primitives and key terminology +# Data types ### uint8 A single-byte unsigned integer. @@ -42,6 +42,8 @@ A four-byte little endian unsigned integer. ### varuint32 A [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer, limited to uint32 payloads. Provides considerable size reduction. +# Definitions + ### Pre-order encoding Refers to an approach for encoding syntax trees, where each node begins with an identifier followed by any arguments or child nodes. Pre-order trees can be decoded iteratively or @@ -65,18 +67,22 @@ Strings referenced by the module (i.e. function names) are encoded as null-termi The following documents the current prototype format. This format is based on and supersedes the v8-native prototype format, originally in a [public design doc](https://docs.google.com/document/d/1-G11CnMA0My20KI9D7dBR6ZCPOBCRD0oCH6SHCPFGx0/edit?usp=sharing). ## High-level structure -A module contains a sequence of sections, containing for each section: - - ```uint8```: A [section type identifier](https://github.com/v8/v8/blob/master/src/wasm/wasm-module.h#L26) for the section - - The section body, which is defined below by section type +A module contains a sequence of sections, each identified by a *section identifier byte* whose encodings +are listed below. ### Memory section +The memory section declares the size and characteristics of the memory associated with the module. A module may contain at most one memory section. -* ```uint8```: The minimum size of the module heap in bytes, as a power of two -* ```uint8```: The maximum size of the module heap in bytes, as a power of two -* ```uint8```: ```1``` if the module's memory is externally visible +| Field | Type | Description | +| ----- | ----- | ----- | +| id = `0x00` | `uint8` | section identifier for memory | +| min_mem_size | `uint8` | minimize memory size as a power of `2` | +| max_mem_size | `uint8` | maximum memory size as a power of `2` | +| exported | `uint8` | `1` if the memory is visible outside the module | ### Signatures section +The signatures section declares all function signatures that will be used in the module. A module may contain at most one signatures section. | Field | Type | Description | @@ -85,22 +91,15 @@ A module may contain at most one signatures section. | count | `varuint32` | count of signature entries to follow | | entries | `signature_entry*` | repeated signature entries as described below | -* [```varuint32```](#varuint32): The number of function signatures in the section -* For each function signature: - - ```uint8```: The number of parameters - - ```uint8```: The function return type, as a [LocalType](https://github.com/v8/v8/blob/master/src/wasm/wasm-opcodes.h#L16) - - For each parameter: - + ```uint8```: The parameter type, as a LocalType - -Local types are encoded with these values: -* `0` indicates the lack of a return type -* `1` type `i32` -* `2` type `i64` -* `3` type `f32` -* `4` type `f64` +#### Signature entry +| Field | Type | Description | +| ----- | ----- | ----- | +| param_count | `uint8` | the number of parameters to the function | +| return_type | `local_type?` | the return type of the function, with `0` indicating no return type | +| param_types | `local_type*` | the parameter types of the function, with
`1` indicating type `i32`
`2` indicating type `i64`
`3` indicating type `f32`
`4` indicating type `f64` | ### Functions section -The Functions section declares the functions in the module and must be preceded by a [Signatures](#signatures-section) section. A module may contain at most one functions section. It consists of a count of entries followed immediately by those entries. +The Functions section declares the functions in the module and must be preceded by a [Signatures](#signatures-section) section. A module may contain at most one functions section. | Field | Type | Description | | ----- | ----- | ----- | @@ -126,6 +125,7 @@ must contain a function body. | body | `bytes` | `flags[0] == 0` | function body | ### Data Segments section +The data segemnts section declares the initialized data that should be loaded into the linear memory. A module may only contain one data segments section. | Field | Type | Description | @@ -142,6 +142,8 @@ A module may only contain one data segments section. - ```uint8```: ```1``` if the segment's data should be automatically loaded into memory at module load time. ### Indirect Function Table section +The indirect function table section declares the size and entries of the indirect function table, which consist +of indexes into the [Functions](#functions-section) section. This section must be preceded by a [Functions](#functions-section) section. | Field | Type | Description | @@ -150,13 +152,13 @@ This section must be preceded by a [Functions](#functions-section) section. | count | `varuint32` | count of entries to follow | | entries | `uint16*` | repeated indexes into the function table | -### WLL section - -* ```varuint32```: The size of the section body, in bytes -* The section body (contents currently undefined) - ### End section -This indicates the end of the module's sections. Additional data can follow this section marker (for example, to store function names or data segment bodies) but it is not parsed by the decoder. +This indicates the end of the sections. Additional data that follows this section marker is not interpreted +by the decoder. It can used, for example, to store function names or data segment bodies. + +| Field | Type | Description | +| ----- | ----- | ----- | +| id = `0x06` | `uint8` | section identifier for end of sections | ### Nonstandard: Globals section @@ -170,7 +172,7 @@ A module may only contain one globals section. This section is currently for V8 #### Global Variable entry -A global variable entry describes a variable outside the WASM lineary memory. It can only be loaded and +A global variable entry describes a variable outside the WASM linear memory. A Global variable can only be loaded and stored to from dedicated instructions. | Field | Type | Description | @@ -179,4 +181,10 @@ stored to from dedicated instructions. | type | `uint8` | the type of the global, as a memory type | | exported | `uint8` | a boolean indicating whether the global variable is exported | +### WorkInProgress: WLL section +| Field | Type | Description | +| ----- | ----- | ----- | +| id = `0x11` | `uint8` | section identifier for globals | +| size | `varuint32` | size of this section in bytes | +| body | `bytes` | contents of this section | -- cgit v1.2.3 From 6490fe86194110c8b191b2b0c1a651dad624485c Mon Sep 17 00:00:00 2001 From: titzer Date: Mon, 8 Feb 2016 18:19:30 +0100 Subject: Add first AST section. --- BinaryEncoding.md | 33 ++++++++++++++++++++++----------- 1 file changed, 22 insertions(+), 11 deletions(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index ae51fd3..6a3aec1 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -1,7 +1,6 @@ # Binary Encoding -This document describes the [portable](Portability.md) binary encoding of the -[Abstract Syntax Tree](AstSemantics.md) nodes. +This document describes the [portable](Portability.md) binary encoding of the WebAssembly modules. The binary encoding is a dense representation of module information that enables small files, fast decoding, and reduced memory usage. @@ -40,24 +39,23 @@ A two-byte little endian unsigned integer. A four-byte little endian unsigned integer. ### varuint32 -A [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer, limited to uint32 payloads. Provides considerable size reduction. +A [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer, limited to uint32 values. # Definitions ### Pre-order encoding -Refers to an approach for encoding syntax trees, where each node begins with an identifier -followed by any arguments or child nodes. Pre-order trees can be decoded iteratively or -recursively. Alternative approaches include post-order trees and table representations. +Refers to an approach for encoding syntax trees, where each node begins with an identifying binary +sequence, then followed recursively by any child nodes. * Examples * Given a simple AST node: `struct I32Add { AstNode *left, *right; }` - * First write the opcode of `I32Add` (uint8) + * First write the binary sequence for `I32Add` (uint8) * Then recursively write the left and right nodes. - * Given a call AST node: `struct Call { uint32_t callee; vector args; }` - * First write the opcode of `Call` (uint8) - * Then write the (variable-length) integer `Call::callee` (varuint32) - * Then recursively write each argument node, where arity is determined by looking up `callee` in a table of signatures + * Given a call AST node: `struct Call { uint32_t callee_index; vector args; }` + * First write the binary sequence of `Call` (uint8) + * Then write the (variable-length) integer `Call::callee_index` (varuint32) + * Then recursively write each argument node, where arity is determined by looking up `callee_index` in a table of signatures ### Strings Strings referenced by the module (i.e. function names) are encoded as null-terminated [UTF8](http://unicode.org/faq/utf_bom.html#UTF8). @@ -188,3 +186,16 @@ stored to from dedicated instructions. | id = `0x11` | `uint8` | section identifier for globals | | size | `varuint32` | size of this section in bytes | | body | `bytes` | contents of this section | + +# AST Encoding + +Function bodies consist of a dense pre-order encoding of an [Abstract Syntax Tree](AstSemantics.md). +Each node in the abstract syntax tree corresponds to an operator, such as `i32.add` or `if` or `block`. +Most operators encode as a single `uint8`, but some require *immediates* which immediately follow the +first byte, before the child nodes. + +## Control flow operators + +| Name | Opcode | Immediate | Description | +| ---- | ---- | ---- | ---- | +| `nop` | `0x00` | | no operation | -- cgit v1.2.3 From f20b82d3ffa27dd45f0c6ab90b8e3603b6a37b6f Mon Sep 17 00:00:00 2001 From: titzer Date: Mon, 8 Feb 2016 18:27:22 +0100 Subject: Add control flow operators. --- BinaryEncoding.md | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 6a3aec1..5af6dbf 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -199,3 +199,13 @@ first byte, before the child nodes. | Name | Opcode | Immediate | Description | | ---- | ---- | ---- | ---- | | `nop` | `0x00` | | no operation | +| `block` | `0x01` | count = `varuint32` | a sequence of expressions, the last of which yields a value | +| `loop` | `0x02` | count = `varuint32` | a block which can also form control flow loops | +| `if` | `0x03` | | high-level one-armed if | +| `if_else` | `0x04` | | high-level two-armed if | +| `select` | `0x05` | | select one of two values based on condition | +| `br` | `0x06` | relative_depth = `uint8` | break that targets a outer nested block | +| `br_if` | `0x07` | relative_depth = `uint8` | conditional break that targets a outer nested block | +| `tableswitch` | 0x08 | `tableswitch_operand` | tableswitch control flow construct | +| `return` | 0x14 | | return zero or one value from this function | +| `unreachable` | 0x15 | | trap immediately | -- cgit v1.2.3 From 5b6d8db53998f995fb92cba04b159a49dd478bb8 Mon Sep 17 00:00:00 2001 From: titzer Date: Mon, 8 Feb 2016 18:43:24 +0100 Subject: Add basic operators. --- BinaryEncoding.md | 38 +++++++++++++++++++++++++++++++++----- 1 file changed, 33 insertions(+), 5 deletions(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 5af6dbf..c8e5052 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -199,13 +199,41 @@ first byte, before the child nodes. | Name | Opcode | Immediate | Description | | ---- | ---- | ---- | ---- | | `nop` | `0x00` | | no operation | -| `block` | `0x01` | count = `varuint32` | a sequence of expressions, the last of which yields a value | -| `loop` | `0x02` | count = `varuint32` | a block which can also form control flow loops | +| `block` | `0x01` | count = `uint8` | a sequence of expressions, the last of which yields a value | +| `loop` | `0x02` | count = `uint8` | a block which can also form control flow loops | | `if` | `0x03` | | high-level one-armed if | | `if_else` | `0x04` | | high-level two-armed if | | `select` | `0x05` | | select one of two values based on condition | | `br` | `0x06` | relative_depth = `uint8` | break that targets a outer nested block | | `br_if` | `0x07` | relative_depth = `uint8` | conditional break that targets a outer nested block | -| `tableswitch` | 0x08 | `tableswitch_operand` | tableswitch control flow construct | -| `return` | 0x14 | | return zero or one value from this function | -| `unreachable` | 0x15 | | trap immediately | +| `tableswitch` | `0x08` | see below | tableswitch control flow construct | +| `return` | `0x14` | | return zero or one value from this function | +| `unreachable` | `0x15` | | trap immediately | + +The `tableswitch` operator has as complex immediate operand which is encoded as follows: + +| Field | Type | Description | +| ---- | ---- | ---- | +| case_count | `uint16` | number of cases in the case_table | +| target_count | `uint16` | number of targets in the target_table | +| target_table | `uint16*` | target entries where
`>= 0x800` indicates an outer block to which to break
`<= 0x8000` indicates a case to which to jump | + +The table switch operator is then immediately followed by `case_count` case expressions which by default fall through to each other. + +## Basic operators +| Name | Opcode | Immediate | Description | +| ---- | ---- | ---- | ---- | +| `i8.const` | `0x09` | value = `uint8` | a constant value, signed extended to type `i32` | +| `i32.const` | `0x0a` | value = `uint32` | a constant value interpreted as `i32` | +| `i64.const` | `0x0b` | value = `uint64` | a constant value interpreted as `i64` | +| `f64.const` | `0x0c` | value = `uint64` | a constant value interpreted as `f64` | +| `f32.const` | `0x0d` | value = `uint32` | a constant value interpreted as `f32` | +| `get_local` | `0x0e` | local_index = `varuint32` | read a local variable or parameter | +| `set_local` | `0x0f` | local_index = `varuint32` | write a local variable or parameter | +| `load_global` | `0x10` | index = `varuint32` | * nonstandard internal opcode | +| `store_global` | `0x11` | index = `varuint32` | * nonstandard internal opcode | +| `call_function` | `0x12` | function_index = `varuint32` | call a function by its index | +| `call_indirect` | `0x13` | signature_index = `varuint32` | call a function indirect with an expected signature | + + + -- cgit v1.2.3 From 889483aefd4a1df55373191e82e1c26ea0a5104e Mon Sep 17 00:00:00 2001 From: titzer Date: Mon, 8 Feb 2016 19:11:36 +0100 Subject: Add simple operators. --- BinaryEncoding.md | 161 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 160 insertions(+), 1 deletion(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index c8e5052..50afc0f 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -216,7 +216,7 @@ The `tableswitch` operator has as complex immediate operand which is encoded as | ---- | ---- | ---- | | case_count | `uint16` | number of cases in the case_table | | target_count | `uint16` | number of targets in the target_table | -| target_table | `uint16*` | target entries where
`>= 0x800` indicates an outer block to which to break
`<= 0x8000` indicates a case to which to jump | +| target_table | `uint16*` | target entries where
`>= 0x8000` indicates an outer block to which to break
`<= 0x8000` indicates a case to which to jump | The table switch operator is then immediately followed by `case_count` case expressions which by default fall through to each other. @@ -235,5 +235,164 @@ The table switch operator is then immediately followed by `case_count` case expr | `call_function` | `0x12` | function_index = `varuint32` | call a function by its index | | `call_indirect` | `0x13` | signature_index = `varuint32` | call a function indirect with an expected signature | +## Memory-related operators + +| Name | Opcode | Immediate | Description | +| ---- | ---- | ---- | ---- | +| `i32.load8_s` | `0x20` | `memory_immediate` | load from memory | +| `i32.load8_u` | `0x21` | `memory_immediate` | load from memory | +| `i32.load16_s` | `0x22` | `memory_immediate` | load from memory | +| `i32.load16_u` | `0x23` | `memory_immediate` | load from memory | +| `i64.load8_s` | `0x24` | `memory_immediate` | load from memory | +| `i64.load8_u` | `0x25` | `memory_immediate` | load from memory | +| `i64.load16_s` | `0x26` | `memory_immediate` | load from memory | +| `i64.load16_u` | `0x27` | `memory_immediate` | load from memory | +| `i64.load32_s` | `0x28` | `memory_immediate` | load from memory | +| `i64.load32_u` | `0x29` | `memory_immediate` | load from memory | +| `i32.load` | `0x2a` | `memory_immediate` | load from memory | +| `i64.load` | `0x2b` | `memory_immediate` | load from memory | +| `f32.load` | `0x2c` | `memory_immediate` | load from memory | +| `f64.load` | `0x2d` | `memory_immediate` | load from memory | +| `i32.store8` | `0x2e` | `memory_immediate` | store to memory | +| `i32.store16` | `0x2f` | `memory_immediate` | store to memory | +| `i64.store8` | `0x30` | `memory_immediate` | store to memory | +| `i64.store16` | `0x31` | `memory_immediate` | store to memory | +| `i64.store32` | `0x32` | `memory_immediate` | store to memory | +| `i32.store` | `0x33` | `memory_immediate` | store to memory | +| `i64.store` | `0x34` | `memory_immediate` | store to memory | +| `f32.store` | `0x35` | `memory_immediate` | store to memory | +| `f64.store` | `0x36` | `memory_immediate` | store to memory | +| `memory_size` | `0x3b` | | query the size of memory | +| `grow_memory` | `0x39` | | grow the size of memory | + +The `memory_immediate` type is encoded as follows: + +| Name | Type | Present? | Description | +| ---- | ---- | ---- | ---- | +| flags | `uint8` | always | a bitfield where
bit `4` indicates an offset follows
bit `7` indicates natural alignment | +| offset | `varuint32` | `flags[4] == 1` | the value of the offset | + +## Simple operators + +| Name | Opcode | Immediate | Description | +| ---- | ---- | ---- | ---- | +| `i32.add` | `0x40` | | | +| `i32.sub` | `0x41` | | | +| `i32.mul` | `0x42` | | | +| `i32.div_s` | `0x43` | | | +| `i32.div_u` | `0x44` | | | +| `i32.rem_s` | `0x45` | | | +| `i32.rem_u` | `0x46` | | | +| `i32.and` | `0x47` | | | +| `i32.ior` | `0x48` | | | +| `i32.xor` | `0x49` | | | +| `i32.shl` | `0x4a` | | | +| `i32.shr_u` | `0x4b` | | | +| `i32.shr_s` | `0x4c` | | | +| `i32.eq` | `0x4d` | | | +| `i32.ne` | `0x4e` | | | +| `i32.lt_s` | `0x4f` | | | +| `i32.le_s` | `0x50` | | | +| `i32.lt_u` | `0x51` | | | +| `i32.le_u` | `0x52` | | | +| `i32.gt_s` | `0x53` | | | +| `i32.ge_s` | `0x54` | | | +| `i32.gt_u` | `0x55` | | | +| `i32.ge_u` | `0x56` | | | +| `i32.clz` | `0x57` | | | +| `i32.ctz` | `0x58` | | | +| `i32.popcnt` | `0x59` | | | +| `bool.not` | `0x5a` | | | +| `i64.add` | `0x5b` | | | +| `i64.sub` | `0x5c` | | | +| `i64.mul` | `0x5d` | | | +| `i64.div_s` | `0x5e` | | | +| `i64.div_u` | `0x5f` | | | +| `i64.rem_s` | `0x60` | | | +| `i64.rem_u` | `0x61` | | | +| `i64.and` | `0x62` | | | +| `i64.ior` | `0x63` | | | +| `i64.xor` | `0x64` | | | +| `i64.shl` | `0x65` | | | +| `i64.shr_u` | `0x66` | | | +| `i64.shr_s` | `0x67` | | | +| `i64.eq` | `0x68` | | | +| `i64.ne` | `0x69` | | | +| `i64.lt_s` | `0x6a` | | | +| `i64.le_s` | `0x6b` | | | +| `i64.lt_u` | `0x6c` | | | +| `i64.le_u` | `0x6d` | | | +| `i64.gt_s` | `0x6e` | | | +| `i64.ge_s` | `0x6f` | | | +| `i64.gt_u` | `0x70` | | | +| `i64.ge_u` | `0x71` | | | +| `i64.clz` | `0x72` | | | +| `i64.ctz` | `0x73` | | | +| `i64.popcnt` | `0x74` | | | +| `f32.add` | `0x75` | | | +| `f32.sub` | `0x76` | | | +| `f32.mul` | `0x77` | | | +| `f32.div` | `0x78` | | | +| `f32.min` | `0x79` | | | +| `f32.max` | `0x7a` | | | +| `f32.abs` | `0x7b` | | | +| `f32.neg` | `0x7c` | | | +| `f32.copysign` | `0x7d` | | | +| `f32.ceil` | `0x7e` | | | +| `f32.floor` | `0x7f` | | | +| `f32.trunc` | `0x80` | | | +| `f32.nearestint` | `0x81` | | | +| `f32.sqrt` | `0x82` | | | +| `f32.eq` | `0x83` | | | +| `f32.ne` | `0x84` | | | +| `f32.lt` | `0x85` | | | +| `f32.le` | `0x86` | | | +| `f32.gt` | `0x87` | | | +| `f32.ge` | `0x88` | | | +| `f64.add` | `0x89` | | | +| `f64.sub` | `0x8a` | | | +| `f64.mul` | `0x8b` | | | +| `f64.div` | `0x8c` | | | +| `f64.min` | `0x8d` | | | +| `f64.max` | `0x8e` | | | +| `f64.abs` | `0x8f` | | | +| `f64.neg` | `0x90` | | | +| `f64.copysign` | `0x91` | | | +| `f64.ceil` | `0x92` | | | +| `f64.floor` | `0x93` | | | +| `f64.trunc` | `0x94` | | | +| `f64.nearestint` | `0x95` | | | +| `f64.sqrt` | `0x96` | | | +| `f64.eq` | `0x97` | | | +| `f64.ne` | `0x98` | | | +| `f64.lt` | `0x99` | | | +| `f64.le` | `0x9a` | | | +| `f64.gt` | `0x9b` | | | +| `f64.ge` | `0x9c` | | | +| `i32.sconvertf32` | `0x9d` | | | +| `i32.sconvertf64` | `0x9e` | | | +| `i32.uconvertf32` | `0x9f` | | | +| `i32.uconvertf64` | `0xa0` | | | +| `i32.converti64` | `0xa1` | | | +| `i64.sconvertf32` | `0xa2` | | | +| `i64.sconvertf64` | `0xa3` | | | +| `i64.uconvertf32` | `0xa4` | | | +| `i64.uconvertf64` | `0xa5` | | | +| `i64.sconverti32` | `0xa6` | | | +| `i64.uconverti32` | `0xa7` | | | +| `f32.sconverti32` | `0xa8` | | | +| `f32.uconverti32` | `0xa9` | | | +| `f32.sconverti64` | `0xaa` | | | +| `f32.uconverti64` | `0xab` | | | +| `f32.convertf64` | `0xac` | | | +| `f32.reinterpreti32` | `0xad` | | | +| `f64.sconverti32` | `0xae` | | | +| `f64.uconverti32` | `0xaf` | | | +| `f64.sconverti64` | `0xb0` | | | +| `f64.uconverti64` | `0xb1` | | | +| `f64.convertf32` | `0xb2` | | | +| `f64.reinterpreti64` | `0xb3` | | | +| `i32.reinterpretf32` | `0xb4` | | | +| `i64.reinterpretf64` | `0xb5` | | | -- cgit v1.2.3 From 685111b3b7a1bf05938d5c55bc8358d483de400a Mon Sep 17 00:00:00 2001 From: titzer Date: Mon, 8 Feb 2016 19:17:38 +0100 Subject: Add links. --- BinaryEncoding.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 50afc0f..72f44e5 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -194,7 +194,7 @@ Each node in the abstract syntax tree corresponds to an operator, such as `i32.a Most operators encode as a single `uint8`, but some require *immediates* which immediately follow the first byte, before the child nodes. -## Control flow operators +## Control flow operators ([described here](AstSemantics.md#control-flow-structures)) | Name | Opcode | Immediate | Description | | ---- | ---- | ---- | ---- | @@ -220,7 +220,7 @@ The `tableswitch` operator has as complex immediate operand which is encoded as The table switch operator is then immediately followed by `case_count` case expressions which by default fall through to each other. -## Basic operators +## Basic operators ([described here](AstSemantics.md#constants)) | Name | Opcode | Immediate | Description | | ---- | ---- | ---- | ---- | | `i8.const` | `0x09` | value = `uint8` | a constant value, signed extended to type `i32` | @@ -235,7 +235,7 @@ The table switch operator is then immediately followed by `case_count` case expr | `call_function` | `0x12` | function_index = `varuint32` | call a function by its index | | `call_indirect` | `0x13` | signature_index = `varuint32` | call a function indirect with an expected signature | -## Memory-related operators +## Memory-related operators ([described here](AstSemantics.md#linear-memory-accesses)) | Name | Opcode | Immediate | Description | | ---- | ---- | ---- | ---- | @@ -272,7 +272,7 @@ The `memory_immediate` type is encoded as follows: | flags | `uint8` | always | a bitfield where
bit `4` indicates an offset follows
bit `7` indicates natural alignment | | offset | `varuint32` | `flags[4] == 1` | the value of the offset | -## Simple operators +## Simple operators ([described here](AstSemantics#32-bit-integer-operators)) | Name | Opcode | Immediate | Description | | ---- | ---- | ---- | ---- | -- cgit v1.2.3 From 9c59a694bfaf116ba63312529dd5031310b2a7c3 Mon Sep 17 00:00:00 2001 From: titzer Date: Tue, 9 Feb 2016 09:26:14 +0100 Subject: Update BinaryEncoding.md --- BinaryEncoding.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 72f44e5..5e265ef 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -24,8 +24,10 @@ The encoding is split into three layers: to deliver dramatically smaller files. Most importantly, the layering approach allows development and standardization to -occur incrementally, even though production-quality implementations will need to -implement all of the layers. +occur incrementally. For example, Layer 1 and Layer 2 encoding techniques can be +experimented with by application-level decompressing to the layer below. As +compression techniques stabilize, they can be standardized and moved into native +implementations. # Data types -- cgit v1.2.3 From fd8db871fc5d2a89b0e6aa2041695f5e6f655d00 Mon Sep 17 00:00:00 2001 From: titzer Date: Tue, 9 Feb 2016 10:26:33 +0100 Subject: Add definition of value_type. --- BinaryEncoding.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 5e265ef..6150c86 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -43,6 +43,14 @@ A four-byte little endian unsigned integer. ### varuint32 A [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer, limited to uint32 values. +### value_type +A single-byte unsigned integer indicating a [value type](AstSemantics.md#types). These types are encoded as: +* `1` indicating type `i32` +* `2` indicating type `i64` +* `3` indicating type `f32` +* `4` indicating type `f64` + + # Definitions ### Pre-order encoding @@ -95,8 +103,8 @@ A module may contain at most one signatures section. | Field | Type | Description | | ----- | ----- | ----- | | param_count | `uint8` | the number of parameters to the function | -| return_type | `local_type?` | the return type of the function, with `0` indicating no return type | -| param_types | `local_type*` | the parameter types of the function, with
`1` indicating type `i32`
`2` indicating type `i64`
`3` indicating type `f32`
`4` indicating type `f64` | +| return_type | `value_type?` | the return type of the function, with `0` indicating no return type | +| param_types | `value_type*` | the parameter types of the function | ### Functions section The Functions section declares the functions in the module and must be preceded by a [Signatures](#signatures-section) section. A module may contain at most one functions section. -- cgit v1.2.3 From 01492c6dde963ac7eb6d00d8d5754de98cc2a0c1 Mon Sep 17 00:00:00 2001 From: titzer Date: Tue, 9 Feb 2016 10:29:35 +0100 Subject: Update definition of pre-order encoding. --- BinaryEncoding.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 6150c86..e014818 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -58,13 +58,13 @@ Refers to an approach for encoding syntax trees, where each node begins with an sequence, then followed recursively by any child nodes. * Examples - * Given a simple AST node: `struct I32Add { AstNode *left, *right; }` - * First write the binary sequence for `I32Add` (uint8) + * Given a simple AST node: `I32Add(left: AstNode, right: AstNode)` + * First write the opcode for `I32Add` (uint8) * Then recursively write the left and right nodes. - * Given a call AST node: `struct Call { uint32_t callee_index; vector args; }` - * First write the binary sequence of `Call` (uint8) - * Then write the (variable-length) integer `Call::callee_index` (varuint32) + * Given a call AST node: `Call(callee_index: uint32_t, args: AstNode[])` + * First write the opcode of `Call` (uint8) + * Then write the (variable-length) integer `callee_index` (varuint32) * Then recursively write each argument node, where arity is determined by looking up `callee_index` in a table of signatures ### Strings -- cgit v1.2.3 From 550cd2938dbc054beb099391479cfb8e8a0523e7 Mon Sep 17 00:00:00 2001 From: titzer Date: Tue, 9 Feb 2016 10:30:54 +0100 Subject: Update BinaryEncoding.md --- BinaryEncoding.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index e014818..a2726ff 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -118,7 +118,7 @@ The Functions section declares the functions in the module and must be preceded #### Function Entry Each function entry describes a function that can be optionally named, imported and/or exported. Non-imported functions -must contain a function body. +must contain a function body. Imported and exported functions must have a name. Imported functions do not contain a body. | Field | Type | Present? | Description | | ----- | ----- | ----- | ----- | -- cgit v1.2.3 From eaf8d098eef082c5b28baacb70388f6767d8887b Mon Sep 17 00:00:00 2001 From: titzer Date: Tue, 9 Feb 2016 10:31:51 +0100 Subject: Update BinaryEncoding.md --- BinaryEncoding.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index a2726ff..d2ea66a 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -125,10 +125,10 @@ must contain a function body. Imported and exported functions must have a name. | flags | `uint8` | always | flags indicating attributes of a function
bit `0` : a name is present
bit `1` : the function is an import
bit `2` : the function has local variables
bit `3` : the function is an export | | signature | `uint16` | always | index into the Signature section | | name | `uint32` | `flags[0] == 1` | name of the function as an offset within the module | -| i32 count | `uint16` | `flags[2] == 1` | number of i32 local variables | -| i64 count | `uint16` | `flags[2] == 1` | number of i64 local variables | -| f32 count | `uint16` | `flags[2] == 1` | number of f32 local variables | -| f64 count | `uint16` | `flags[2] == 1` | number of f64 local variables | +| i32 count | `uint16` | `flags[2] == 1` | number of `i32` local variables | +| i64 count | `uint16` | `flags[2] == 1` | number of `i64` local variables | +| f32 count | `uint16` | `flags[2] == 1` | number of `f32` local variables | +| f64 count | `uint16` | `flags[2] == 1` | number of `f64` local variables | | body size | `uint16` | `flags[0] == 0` | size of function body to follow, in bytes | | body | `bytes` | `flags[0] == 0` | function body | -- cgit v1.2.3 From 34165f655bdcec2e380142b82203253fe8290d20 Mon Sep 17 00:00:00 2001 From: titzer Date: Tue, 9 Feb 2016 10:33:13 +0100 Subject: Update BinaryEncoding.md --- BinaryEncoding.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index d2ea66a..29436c3 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -279,7 +279,7 @@ The `memory_immediate` type is encoded as follows: | Name | Type | Present? | Description | | ---- | ---- | ---- | ---- | -| flags | `uint8` | always | a bitfield where
bit `4` indicates an offset follows
bit `7` indicates natural alignment | +| flags | `uint8` | always | a bitfield where
bit `4` indicates an offset follows
bit `7` indicates natural alignment
other bits are reserved for future use | | offset | `varuint32` | `flags[4] == 1` | the value of the offset | ## Simple operators ([described here](AstSemantics#32-bit-integer-operators)) -- cgit v1.2.3 From acab7ac6a768872f46d24dfeb3358804df49a403 Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 11 Feb 2016 10:44:12 +0100 Subject: Correct off by one --- BinaryEncoding.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 29436c3..e885a57 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -226,7 +226,7 @@ The `tableswitch` operator has as complex immediate operand which is encoded as | ---- | ---- | ---- | | case_count | `uint16` | number of cases in the case_table | | target_count | `uint16` | number of targets in the target_table | -| target_table | `uint16*` | target entries where
`>= 0x8000` indicates an outer block to which to break
`<= 0x8000` indicates a case to which to jump | +| target_table | `uint16*` | target entries where
`>= 0x8000` indicates an outer block to which to break
`< 0x8000` indicates a case to which to jump | The table switch operator is then immediately followed by `case_count` case expressions which by default fall through to each other. -- cgit v1.2.3 From 2f64313c57309d2c21ab07ef47ad7403da56fd4b Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 11 Feb 2016 15:45:44 +0100 Subject: Update BinaryEncoding.md --- BinaryEncoding.md | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index e885a57..9f42f61 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -31,6 +31,9 @@ implementations. # Data types +### int8 +A single-byte signed integer. + ### uint8 A single-byte unsigned integer. @@ -201,8 +204,8 @@ stored to from dedicated instructions. Function bodies consist of a dense pre-order encoding of an [Abstract Syntax Tree](AstSemantics.md). Each node in the abstract syntax tree corresponds to an operator, such as `i32.add` or `if` or `block`. -Most operators encode as a single `uint8`, but some require *immediates* which immediately follow the -first byte, before the child nodes. +Operators are encoding by an opcode byte followed by immediate bytes (if any), followed by children +nodes (if any). ## Control flow operators ([described here](AstSemantics.md#control-flow-structures)) -- cgit v1.2.3 From a2072993c67cb1cb773129800e50dc08ea2c3c89 Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 11 Feb 2016 15:46:24 +0100 Subject: Update BinaryEncoding.md --- BinaryEncoding.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 9f42f61..940c962 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -236,7 +236,7 @@ The table switch operator is then immediately followed by `case_count` case expr ## Basic operators ([described here](AstSemantics.md#constants)) | Name | Opcode | Immediate | Description | | ---- | ---- | ---- | ---- | -| `i8.const` | `0x09` | value = `uint8` | a constant value, signed extended to type `i32` | +| `i8.const` | `0x09` | value = `int8` | a constant value, signed extended to type `i32` | | `i32.const` | `0x0a` | value = `uint32` | a constant value interpreted as `i32` | | `i64.const` | `0x0b` | value = `uint64` | a constant value interpreted as `i64` | | `f64.const` | `0x0c` | value = `uint64` | a constant value interpreted as `f64` | -- cgit v1.2.3 From 0df869bb0a69e16f1faf71cabdb8a7dcb15d0857 Mon Sep 17 00:00:00 2001 From: titzer Date: Thu, 11 Feb 2016 15:51:50 +0100 Subject: Update compression --- BinaryEncoding.md | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 940c962..bfdbe30 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -16,12 +16,7 @@ The encoding is split into three layers: The structural compression introduces more efficient encoding of values, rearranges values within the module, and prunes structurally identical tree nodes. -* **Layer 2** applies generic compression techniques, already available - in browsers and other tooling. Algorithms as simple as gzip can deliver - good results, but more sophisticated algorithms like - [LZHAM](https://github.com/richgel999/lzham_codec) and - [Brotli](https://datatracker.ietf.org/doc/draft-alakuijala-brotli/) are able - to deliver dramatically smaller files. +* **Layer 2** Layer 2 applies generic compression algorithms, like [gzip](http://www.gzip.org/) and [Brotli](https://datatracker.ietf.org/doc/draft-alakuijala-brotli/), that are already available in browsers and other tooling. Most importantly, the layering approach allows development and standardization to occur incrementally. For example, Layer 1 and Layer 2 encoding techniques can be -- cgit v1.2.3