Add NanoWasm extensions

This commit introduces third-party extensions to the MVP, branded as "NanoWasm", that aim to reduce memory consumption and computational time for resource-constrained environments, at the expense of increased module file size.
author: Xavier Del Campo Romero <xavi.dcr@tutanota.com> 2024-10-11 08:02:10 +0200
committer: Xavier Del Campo Romero <xavi.dcr@tutanota.com> 2025-03-10 00:03:41 +0100
commit: e8f0f3cd54c4142a0463659d12f1667bc35e7ace (patch)
tree: e74ca8efeacbafc85a9e8680cea1ece43c99e9e2
parent: 08e06cccaf9fe14558e1f3deec81b3d14175cd4b (diff)
2 files changed, 147 insertions, 0 deletions
diff --git a/BinaryEncoding.md b/BinaryEncoding.md
index 2677b35..91f1094 100644
--- a/BinaryEncoding.md
+++ b/BinaryEncoding.md
@@ -532,6 +532,91 @@ where a `local_name` is encoded as:
 | index | `varuint32` | the index of the function whose locals are being named |
 | local_map | `name_map` | assignment of names to local indices |
 
+### NanoWasm type offset section
+
+Custom section `name` field: `"nw_to"`
+
+The NanoWasm type offset section is a
+[custom section](#high-level-structure). It is therefore encoded with id `0`
+followed by the name string `"nw_to"`. It is meant as a
+[NanoWasm](#NanoWasm.md) extension to the MVP that allows NanoWasm-compatible
+interpreters to quickly retrieve the offset of a type inside the
+[type section](#type-section), without having to keep this information
+in-memory.
+
+| Field | Type | Description |
+| ----- | ---- | ----------- |
+| offset | uint32* | Type offset inside the [type section](#type-section), encoded as little-endian
+
+### NanoWasm function type index section
+
+Custom section `name` field: `"nw_fti"`
+
+The NanoWasm function type index section is a
+[custom section](#high-level-structure). It is therefore encoded with id `0`
+followed by the name string `"nw_ft"`. It is meant as a
+[NanoWasm](#NanoWasm.md) extension to the MVP that allows NanoWasm-compatible
+interpreters to quickly retrieve the type index of a function, without having
+to read the [function section](#function-section) or keep this information
+in-memory.
+
+| Field | Type | Description |
+| ----- | ---- | ----------- |
+| offset | uint32* | Type index, as expressed by the [function section](#function-section), encoded as little-endian
+
+### NanoWasm function body offset section
+
+Custom section `name` field: `"nw_fbo"`
+
+The Nanowasm function body offset section is a
+[custom section](#high-level-structure). It is therefore encoded with id `0`
+followed by the name string `"nw_fbo"`. It is meant as a
+[NanoWasm](#NanoWasm.md) extension to the MVP that allows NanoWasm-compatible
+interpreters to quickly retrieve the offset of a function body inside the
+[code section](#code-section), without having to keep this information
+in-memory.
+
+The Nanowasm function body offset section contains an array of fixed-width
+offsets. This allows interpreters to inspect the offset a given
+[function index](Modules.md#function-index-space) with constant-time access.
+
+Similarly to function bodies in the code section, imported functions are not
+listed on this section.
+
+| Field | Type | Description |
+| ----- | ---- | ----------- |
+| offset | uint32* | Function body offset inside the [code section](#code-section), encoded as little-endian
+
+### NanoWasm label offset section
+
+Custom section `name` field: `"nw_lo"`
+
+The Nanowasm label offset section is a [custom section](#high-level-structure).
+It is therefore encoded with id `0` followed by the name string `"nw_lo"`.
+It is meant as a [NanoWasm](#NanoWasm.md) extension to the MVP that allows NanoWasm-compatible interpreters to quickly
+retrieve the offset for a given label relative to a function body, without
+having to keep this information in-memory.
+
+The Nanowasm label offset section starts with an array of fixed-width
+offsets to an array of `label_offset` entries. This allows interpreters to
+inspect the offset for a given label when a
+[control flow operator](#Control-flow-operators) is interpreted, without
+having to read the rest of the body function beforehand.
+
+Similarly to function bodies in the code section, imported functions are not
+listed on this section.
+
+| Field | Type | Description |
+| ----- | ---- | ----------- |
+| offset | uint32* | offset to the `label_offset` entry, encoded as little-endian
+
+Where each non-imported function defines a `label_offset` entry as follows:
+
+| Field | Type | Description |
+| ----- | ---- | ----------- |
+| count | varuint32 | number of label offsets
+| offset | uint32* | offset to the label index relative to the function body start, encoded as little-endian
+
 # Function Bodies
 
 Function bodies consist of a sequence of local variable declarations followed by
diff --git a/NanoWasm.md b/NanoWasm.md
new file mode 100644
index 0000000..900bda6
--- /dev/null
+++ b/NanoWasm.md
@@ -0,0 +1,62 @@
+# NanoWasm extensions
+
+MVP WebAssembly was designed as a compact binary encoding designed for small
+files, which is often a requirement on a web environment. However, some of
+its decisions incur a non-negligible memory and/or performance penalty.
+For some interpreters designed for resource-constrained environments, this
+might not be desirable or even acceptable.
+
+Therefore, this specification defines a set of third-party extensions,
+implemented as [custom sections](BinaryEncoding.md#high-level-structure),
+that aim to minimize such penalties, while increasing module file size as a
+tradeoff.
+
+The following design decisions were addressed:
+
+## Function body offsets
+
+On the MVP, in order to access the start of a body functions, interpreters are
+forced to either:
+
+- Store all function body start offsets in-memory during validation.
+- Read all function bodies prior to the function index to be called.
+    - For example, a [`call`](BinaryEncoding.md#Call-operators) to function
+    index `90` might require to read many function bodies behind it in order
+    to retrieve its offset, as every function body has a variable length.
+
+For interpreters that prefer not to rely on dynamically allocated memory
+(i.e., a heap) or with little memory available, the former would not be
+desirable or even possible to achieve. On the other hand, the latter, while
+more memory-efficient, could have a non-negligible impact on performance,
+even more so if the module bytecode is read on-the-fly from a "slow",
+non-volatile memory e.g.: a SD card, a hard drive disk, a CD-ROM, etc.
+
+Therefore, NanoWasm defines the
+[function body offset section](BinaryEncoding.md#nanowasm-function-body-offset-section),
+so that interpreters can retrieve the offset of a given function body with
+constant-time access.
+
+## Function label offsets
+
+The MVP defines a series of
+[control flow operators](Semantics.md#control-constructs-and-instructions) that
+perform operations such as branching. Since WebAssembly is defined as a
+structured language, some of these operators might require the interpreter to
+jump to a given offset within the function that might be defined by an `end`
+operator located *below* the current program counter.
+
+Typically, this requires interpreters to read function bodies before
+interpretation, which again can be done in different ways:
+
+- Read function bodies on module validation and store label offsets in-memory
+for all non-imported functions.
+- Read a function body when a `call` operator is found and store its label
+offsets in-memory.
+
+Both solutions have significant memory and/or computational time penalties,
+which again might be unacceptable for resource-constrained devices. Therefore,
+NanoWasm defines the
+[label offset section](BinaryEncoding.md#nanowasm-label-offset-section) so that
+interpreters can retrieve label offsets with constant-time access, without
+having to read the function bodies before hand *and* without having to keep
+information in-memory.
author	Xavier Del Campo Romero <xavi.dcr@tutanota.com>	2024-10-11 08:02:10 +0200
committer	Xavier Del Campo Romero <xavi.dcr@tutanota.com>	2025-03-10 00:03:41 +0100
commit	e8f0f3cd54c4142a0463659d12f1667bc35e7ace (patch)
tree	e74ca8efeacbafc85a9e8680cea1ece43c99e9e2
parent	08e06cccaf9fe14558e1f3deec81b3d14175cd4b (diff)