COMPARE.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261

# Comparing WebAssembly/Wasm interpreters

This document provides a more detailed explanation behind the
[comparison chart](README.md#comparison-chart).

- **Note:** `N/A` means no measurements have been made yet.

## Asynchronous interface

As of the time of this writing, there is no known asynchronous WebAssembly
interpreter other than `nanowasm`. All other interpreters implement a
function that will run the WebAssembly application until it either
finishes or traps (e.g.: `wasm_application_execute_main` in
`wasm-micro-runtime`).

In the scope of `nanowasm`, it was deemed interesting to design it as an
asynchronous library for several reasons:

- Running several module instances synchronously requires OS-level threading,
which might not be desirable or even feasible to implement under some
resource-constrained environments.
- An asynchronous interface allows hosts to stop the execution for a module
instance as easily as stop calling `nw_run`. Otherwise, interpreters must
provide an `terminate`-like interface that must be called from a separate
context (e.g.: `wasm_runtime_terminate` in `wasm-micro-runtime`).
- The reasons above require a synchronous interpreter to be aware of the
underlying platform, as it must be aware of primitives such as locks in order
to remain thread-safe. However, this restricts portability towards new, unknown
platforms.

Despite its advantages, an asynchronous interface requires a more careful
design and, more importantly, it incurs a larger memory footprint. However,
`nanowasm` strives to remain smaller compared to its synchronous counterparts.

## I/O-agnostic

### Module bytecode

As of the time of this writing, all interpreters other than `nanowasm` require
the module bytecode be memory-mapped. Typically, this requires to either:

- Dump the bytecode into memory.
- Allocate a memory-mapped file.

Whereas the former is inefficient memory-wise and probably unacceptable on
resource-constrained environments, the latter is just not possible unless
a hardware [MMU](https://en.wikipedia.org/wiki/Memory_management_unit) is
present.

In the context of `nanowasm`, it was considered interesting not to assume
_where_ the module bytecode comes from, and instead access it via file-like
semantics. For example, this would allow MMU-less systems to store module
bytecode on non-volatile memory, which is often larger and less expensive,
albeit slower.

### Memories

WebAssembly defines four different memory areas:

- Table memory.
- Linear memory.
- Global memory.
- Stack.

All interpreters other than `nanowasm` allocate these memory areas internally
via the system heap, or a custom heap defined by the user. This raises the
following concerns:

- Some resource-constrained environments might prefer to avoid the use of
a heap, or maybe no heap implementation is even available.
- It forces each of these memory areas to remain contiguous. On environments
with segmented memory, this might limit the amount of contiguous memory that
can be allocated.

In the context of `nanowasm`, MMU-less systems were considered a priority for
its design, and therefore it was conceived so that these memory areas are
never allocated by `nanowasm` itself. Instead, `nanowasm` provides to the
host a series of interfaces (i.e., callbacks) to  implement in order to define
how these areas are accessed. Therefore, whether accessing those areas
requires the use of a heap, and how it is used, is entirely up to the host
implementation.

While possibly a bit cumbersome from a first glance, this flexible design
brings in many new possibilities. For example, it allows MMU-less systems to
store memory pages into non-contiguous memory areas, or even store them into
larger, non-volatile memory, similarly to how fully-fledged operating systems
implement virtual memory.

## No heap required

All interpreters other than `nanowasm` would allocate many internal data
structures, as well as arbitrarily large chunks of data in order to accomodate
the different memory areas defined by the WebAssembly standard. Aside from the
limitations [explained above](#memories), this means the memory required by a
module or a module instance cannot be known at compile-time, since it depends
on how the heap is implemented, and even the module bytecode itself.

On the other hand, `nanowasm` was designed with resource-constrained
environments in mind, where a heap implementation might be either undesired
or just unavailable. Therefore, it had to be implemented so that the memory
required by modules and module instances remained static. This is achieved
efficiently by storing all data structures for all possible states into a
`union`.

This design allows hosts to allocate modules and module instances in any way,
be it:

- Automatically i.e., from the stack.
- Statically i.e., via the `static` qualifier.
- Dynamically i.e., from the heap.

## Big-endian support

Even if little-endian architectures, such as `amd64`, are arguably more popular
as of the time of this writing, big-endian counterparts are still being
produced and are therefore considered equally relevant by `nanowasm`.

### [`wasm-micro-runtime`]

Despite the fact that `wasm-micro-runtime` seems to byte-swap integers
according to the platform endianness, no big-endian platforms are listed so far
on its `README.md`. Also, due to its big code base, it is difficult to ensure
whether all integer reads and write are done in an endianness-agnostic way.

### [`wac`]

`wac` naively compares the `\0asm` magic string as a little-endian integer.

## No compiler-specific extensions

All interpreters other than `nanowasm` rely extensively on system-specific
macros and/or extensions to the C language. These might restrict their use on
less popular compilers and/or new environments.

On the other hand, `nanowasm` is written in standard ANSI C (C89/C90)
i.e., without any language extensions, as well as no system-specific macros.
On a broader sense, the use of macros and/or other preprocessor directives is
restricted to a minimum on `nanowasm`, as opposed to other interpreters such
as `wasm3`.

Such reduced use of the preprocessor is considered to enhance readability,
even if it might incur some extra boilerplate code.

## Public functions

### `nanowasm`

The numbers were extracted from [`nw.h`](include/nanowasm/nw.h).

### [`wasm-micro-runtime`]

- Commit: `4e50d2191ca8f177ad03a9d80eebc44b59a932db`

The numbers were extracted from `wasm_export.h`.

### [`wasm3`]

- Commit: `35b5e2fb53c5cbc1ff3d7e42c381cd7cfa14f308`

The numbers were extracted from `wasm3.h`.

## Minimal memory footprint

### `nanowasm`

The numbers were extracted from the `test` application built by the project
by default, which links the `nanowasm` library.

The project was built with:

```
cmake -B build
cmake --build build
```

Then, the size for the `test` executable was obtained via:

```
$ size build/test/test
   text    data     bss     dec     hex filename
  50590    3576      16   54182    d3a6 build/test/test
```

Of course, these numbers are subject to change since many opcodes are still
not implemented in `nanowasm`.

### [`wasm-micro-runtime`]

- Commit: `4e50d2191ca8f177ad03a9d80eebc44b59a932db`

A minimal application, namely `wamr-ex`, was written with
`wasm-micro-runtime`'s `iwasm` library. This application:

1. Dumps a `.wasm` file into memory, since `wasm-micro-runtime` requires
module code to either reside in memory or belong to a memory-mapped file.
2. Calls the following functions:
    - `wasm_runtime_init`
    - `wasm_runtime_load`
    - `wasm_runtime_instantiate`
    - `wasm_application_execute_main`
    - `wasm_runtime_unload`
    - `wasm_runtime_deinstantiate`

The example was built with the default CMake flags i.e.:

```
cmake -B build
cmake --build build
```

The executable size was the obtained via:

```
$ size build/wamr-ex
   text    data     bss     dec     hex filename
 463165    9224     932  473321   738e9 build/wamr-ex
```

### [`wasm3`]

- Commit: `35b5e2fb53c5cbc1ff3d7e42c381cd7cfa14f308`

`wasm3` provides a sample application, also called `wasm3`, in its source tree.
This application allows to run any `.wasm` file, along with some extra command
line options.

The project was built with the default CMake flags i.e.:

```
cmake -B build
cmake --build build
```

The executable size was the obtained via:

```
$ size build/wasm3
   text    data     bss     dec     hex filename
 531667   20332    6720  558719   8867f build/wasm3
```

## Per-module memory usage

### `nanowasm`

The numbers were extracted by looking up `sizeof (struct nw_mod)` via `gdb(1)`,
from an `x86_64-linux-gnu` machine. Results might vary depending on the
target platform.

## Per-instance memory usage

### `nanowasm`

The numbers were extracted by looking up `sizeof (struct nw_inst)` via `gdb(1)`,
from an `x86_64-linux-gnu` machine. Results might vary depending on the
target platform.

[`wasm-micro-runtime`]: https://github.com/bytecodealliance/wasm-micro-runtime
[`wasm3`]: https://github.com/wasm3/wasm3
[`wac`]: https://github.com/kanaka/wac
[`toywasm`]: https://github.com/yamt/toywasm