From e8f0f3cd54c4142a0463659d12f1667bc35e7ace Mon Sep 17 00:00:00 2001 From: Xavier Del Campo Romero Date: Fri, 11 Oct 2024 08:02:10 +0200 Subject: Add NanoWasm extensions This commit introduces third-party extensions to the MVP, branded as "NanoWasm", that aim to reduce memory consumption and computational time for resource-constrained environments, at the expense of increased module file size. --- BinaryEncoding.md | 85 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ NanoWasm.md | 62 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 147 insertions(+) create mode 100644 NanoWasm.md diff --git a/BinaryEncoding.md b/BinaryEncoding.md index 2677b35..91f1094 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -532,6 +532,91 @@ where a `local_name` is encoded as: | index | `varuint32` | the index of the function whose locals are being named | | local_map | `name_map` | assignment of names to local indices | +### NanoWasm type offset section + +Custom section `name` field: `"nw_to"` + +The NanoWasm type offset section is a +[custom section](#high-level-structure). It is therefore encoded with id `0` +followed by the name string `"nw_to"`. It is meant as a +[NanoWasm](#NanoWasm.md) extension to the MVP that allows NanoWasm-compatible +interpreters to quickly retrieve the offset of a type inside the +[type section](#type-section), without having to keep this information +in-memory. + +| Field | Type | Description | +| ----- | ---- | ----------- | +| offset | uint32* | Type offset inside the [type section](#type-section), encoded as little-endian + +### NanoWasm function type index section + +Custom section `name` field: `"nw_fti"` + +The NanoWasm function type index section is a +[custom section](#high-level-structure). It is therefore encoded with id `0` +followed by the name string `"nw_ft"`. It is meant as a +[NanoWasm](#NanoWasm.md) extension to the MVP that allows NanoWasm-compatible +interpreters to quickly retrieve the type index of a function, without having +to read the [function section](#function-section) or keep this information +in-memory. + +| Field | Type | Description | +| ----- | ---- | ----------- | +| offset | uint32* | Type index, as expressed by the [function section](#function-section), encoded as little-endian + +### NanoWasm function body offset section + +Custom section `name` field: `"nw_fbo"` + +The Nanowasm function body offset section is a +[custom section](#high-level-structure). It is therefore encoded with id `0` +followed by the name string `"nw_fbo"`. It is meant as a +[NanoWasm](#NanoWasm.md) extension to the MVP that allows NanoWasm-compatible +interpreters to quickly retrieve the offset of a function body inside the +[code section](#code-section), without having to keep this information +in-memory. + +The Nanowasm function body offset section contains an array of fixed-width +offsets. This allows interpreters to inspect the offset a given +[function index](Modules.md#function-index-space) with constant-time access. + +Similarly to function bodies in the code section, imported functions are not +listed on this section. + +| Field | Type | Description | +| ----- | ---- | ----------- | +| offset | uint32* | Function body offset inside the [code section](#code-section), encoded as little-endian + +### NanoWasm label offset section + +Custom section `name` field: `"nw_lo"` + +The Nanowasm label offset section is a [custom section](#high-level-structure). +It is therefore encoded with id `0` followed by the name string `"nw_lo"`. +It is meant as a [NanoWasm](#NanoWasm.md) extension to the MVP that allows NanoWasm-compatible interpreters to quickly +retrieve the offset for a given label relative to a function body, without +having to keep this information in-memory. + +The Nanowasm label offset section starts with an array of fixed-width +offsets to an array of `label_offset` entries. This allows interpreters to +inspect the offset for a given label when a +[control flow operator](#Control-flow-operators) is interpreted, without +having to read the rest of the body function beforehand. + +Similarly to function bodies in the code section, imported functions are not +listed on this section. + +| Field | Type | Description | +| ----- | ---- | ----------- | +| offset | uint32* | offset to the `label_offset` entry, encoded as little-endian + +Where each non-imported function defines a `label_offset` entry as follows: + +| Field | Type | Description | +| ----- | ---- | ----------- | +| count | varuint32 | number of label offsets +| offset | uint32* | offset to the label index relative to the function body start, encoded as little-endian + # Function Bodies Function bodies consist of a sequence of local variable declarations followed by diff --git a/NanoWasm.md b/NanoWasm.md new file mode 100644 index 0000000..900bda6 --- /dev/null +++ b/NanoWasm.md @@ -0,0 +1,62 @@ +# NanoWasm extensions + +MVP WebAssembly was designed as a compact binary encoding designed for small +files, which is often a requirement on a web environment. However, some of +its decisions incur a non-negligible memory and/or performance penalty. +For some interpreters designed for resource-constrained environments, this +might not be desirable or even acceptable. + +Therefore, this specification defines a set of third-party extensions, +implemented as [custom sections](BinaryEncoding.md#high-level-structure), +that aim to minimize such penalties, while increasing module file size as a +tradeoff. + +The following design decisions were addressed: + +## Function body offsets + +On the MVP, in order to access the start of a body functions, interpreters are +forced to either: + +- Store all function body start offsets in-memory during validation. +- Read all function bodies prior to the function index to be called. + - For example, a [`call`](BinaryEncoding.md#Call-operators) to function + index `90` might require to read many function bodies behind it in order + to retrieve its offset, as every function body has a variable length. + +For interpreters that prefer not to rely on dynamically allocated memory +(i.e., a heap) or with little memory available, the former would not be +desirable or even possible to achieve. On the other hand, the latter, while +more memory-efficient, could have a non-negligible impact on performance, +even more so if the module bytecode is read on-the-fly from a "slow", +non-volatile memory e.g.: a SD card, a hard drive disk, a CD-ROM, etc. + +Therefore, NanoWasm defines the +[function body offset section](BinaryEncoding.md#nanowasm-function-body-offset-section), +so that interpreters can retrieve the offset of a given function body with +constant-time access. + +## Function label offsets + +The MVP defines a series of +[control flow operators](Semantics.md#control-constructs-and-instructions) that +perform operations such as branching. Since WebAssembly is defined as a +structured language, some of these operators might require the interpreter to +jump to a given offset within the function that might be defined by an `end` +operator located *below* the current program counter. + +Typically, this requires interpreters to read function bodies before +interpretation, which again can be done in different ways: + +- Read function bodies on module validation and store label offsets in-memory +for all non-imported functions. +- Read a function body when a `call` operator is found and store its label +offsets in-memory. + +Both solutions have significant memory and/or computational time penalties, +which again might be unacceptable for resource-constrained devices. Therefore, +NanoWasm defines the +[label offset section](BinaryEncoding.md#nanowasm-label-offset-section) so that +interpreters can retrieve label offsets with constant-time access, without +having to read the function bodies before hand *and* without having to keep +information in-memory. -- cgit v1.2.3