FutureFeatures.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470

# Features to add after the MVP

These are features that make sense in the context of the
[high-level goals](HighLevelGoals.md) of WebAssembly but weren't part of the
initial [Minimum Viable Product](MVP.md) release.

## Tracking Issues

The Community Group and Working Group have adopted [a process document for proposal phases](https://github.com/WebAssembly/meetings/blob/master/process/phases.md). The list of feature proposals and their statuses can be found in the [proposals repository](https://github.com/WebAssembly/proposals).

## On Deck for Immediate Design

:star: = Essential features we want to prioritize adding shortly after
the [MVP](MVP.md).

### Great tooling support
#### :star: :star: :star:

This is covered in the [tooling](Tooling.md) section.


### Feature Testing
#### :star:

Post-MVP, some form of feature-testing will be required. We don't yet have the
experience writing polyfills to know whether `has_feature` is the right
primitive building block so we're not defining it (or something else) until we
gain this experience. In the interim, it's possible to do a crude feature test
(as people do in JavaScript) by just `eval`-ing WebAssembly code and catching
validation errors.

See [Feature test](FeatureTest.md) for a more detailed sketch.

## Proposals we might consider in the future

### Finer-grained control over memory

Provide access to safe OS-provided functionality including:

* `map_file(addr, length, Blob, file-offset)`: semantically, this operator
   copies the specified range from `Blob` into the range `[addr, addr+length)`
   (where `addr+length <= memory_size`) but implementations are encouraged
   to `mmap(addr, length, MAP_FIXED | MAP_PRIVATE, fd)`
* `discard(addr, length)`: semantically, this operator zeroes the given range
   but the implementation is encouraged to drop the zeroed physical pages from
   the process's working set (e.g., by calling `madvise(MADV_DONTNEED)` on
   POSIX)
* `shmem_create(length)`: create a memory object that can be simultaneously
  shared between multiple linear memories
* `map_shmem(addr, length, shmem, shmem-offset)`: like `map_file` except
  `MAP_SHARED`, which isn't otherwise valid on read-only Blobs
* `mprotect(addr, length, prot-flags)`: change protection on the range
  `[addr, addr+length)` (where `addr+length <= memory_size`)
* `decommit(addr, length)`: equivalent to `mprotect(addr, length, PROT_NONE)`
  followed by `discard(addr, length)` and potentially more efficient than
  performing these operators in sequence.

The `addr` and `length` parameters above would be required to be multiples of
[`page_size`](Semantics.md#resizing).

The `mprotect` operator would require hardware memory protection to execute
efficiently and thus may be added as an "optional" feature (requiring a
[feature test](FeatureTest.md) to use). To support efficient execution even when
no hardware memory protection is available, a restricted form of `mprotect`
could be added which is declared statically and only protects low memory
(providing the expected fault-on-low-memory behavior of native C/C++ apps).

The above list of functionality mostly covers the set of functionality
provided by the `mmap` OS primitive. One significant exception is that `mmap`
can allocate noncontiguous virtual address ranges. See the
[FAQ](FAQ.md#what-about-mmap) for rationale.

### Large page support

Some platforms offer support for memory pages as large as 16GiB, which 
can improve  the efficiency of memory management in some situations. WebAssembly
may offer programs the option to specify a larger page size than the [default](Semantics.md#resizing).

### More expressive control flow

Some types of control flow (especially irreducible and indirect) cannot be
expressed with maximum efficiency in WebAssembly without patterned output by the
relooper and [jump-threading](https://en.wikipedia.org/wiki/Jump_threading)
optimizations in the engine. Target uses for more expressive control flow are:

* Language interpreters, which often use computed-`goto`.
* Functional language support, where guaranteed tail call optimization is
  expected for correctness and performance.

Options under consideration:

* No action, `while` and `switch` combined with jump-threading are enough.
* Just add `goto` (direct and indirect).
* Add new control-flow primitives that address common patterns.
* Add signature-restricted Proper Tail Calls.
* Add proper tail call, expanding upon signature-restricted proper tail calls, and
  making it easier to support other languages, especially functional programming
  languages.

### Linear memory bigger than 4 GiB

The WebAssembly MVP supports linear memories with 32-bit indices in the MVP,
which means it's limited to 4 GiB per memory. To support larger sizes, memories
that use 64-bit indices will be added in the future, supporting much greater
linear memory sizes.

Of course, the ability to actually allocate this much memory will always be
subject to dynamic resource availability.

### Source maps integration

* Add a new source maps [module section type](MVP.md#module-structure).
* Either embed the source maps directly or just a URL from which source maps can
  be downloaded.
* Text source maps become intractably large for even moderate-sized compiled
  codes, so probably need to define new binary format for source maps.
* Gestate ideas and start discussions at the
  [Source Map RFC repository](https://github.com/source-map/source-map-rfc/issues)

### Coroutines

Coroutines will [eventually be part of C++][] and is already popular in other
programming languages that WebAssembly will support.

  [eventually be part of C++]: https://wg21.link/n4499

### Asynchronous Signals

TODO

### "Long SIMD"

The initial SIMD API will be a "short SIMD" API, centered around fixed-width
128-bit types and explicit SIMD operators. This is quite portable and useful,
but it won't be able to deliver the full performance capabilities of some of
today's popular hardware. There is [a proposal in the SIMD.js repository][] for
a "long SIMD" model which generalizes to wider hardware vector lengths, making
more natural use of advanced features like vector lane predication,
gather/scatter, and so on. Interesting questions to ask of such an model will
include:

* How will this model map onto popular modern SIMD hardware architectures?
* What is this model's relationship to other hardware parallelism features, such
  as GPUs and threads with shared memory?
* How will this model be used from higher-level programming languages? For
  example, the C++ committee is considering a wide variety of possible
  approaches; which of them might be supported by the model?
* What is the relationship to the "short SIMD" API? "None" may be an acceptable
  answer, but it's something to think about.
* What nondeterminism does this model introduce into the overall platform?
* What happens when code uses long SIMD on a hardware platform which doesn't
  support it? Reasonable options may include emulating it without the benefit of
  hardware acceleration, or indicating a lack of support through feature tests.

  [a proposal in the SIMD.js repository]: https://github.com/tc39/ecmascript_simd/issues/180

### Platform-independent Just-in-Time (JIT) compilation

WebAssembly is a new virtual ISA, and as such applications won't be able to
simply reuse their existing JIT-compiler backends. Applications will instead
have to interface with WebAssembly's instructions as if they were a new ISA.

Applications expect a wide variety of JIT-compilation capabilities. WebAssembly
should support:

* Producing a dynamic library and loading it into the current WebAssembly
  module.
* Define lighter-weight mechanisms, such as the ability to add a function to an
  existing module.
* Support explicitly patchable constructs within functions to allow for very
  fine-grained JIT-compilation. This includes:
    * Code patching for polymorphic inline caching;
	* Call patching to chain JIT-compiled functions together;
	* Temporary halt-insertion within functions, to trap if a function start
      executing while a JIT-compiler's runtime is performing operators
      dangerous to that function.
* Provide JITs access to profile feedback for their JIT-compiled code.
* Code unloading capabilities, especially in the context of code garbage
  collection and defragmentation.

WebAssembly's JIT interface would likely be fairly low-level. However, there
are use cases for higher-level functionality and optimization too. One avenue
for addressing these use cases is a
[JIT and Optimization library](JITLibrary.md).

### Multiprocess support

* `vfork`.
* Inter-process communication.
* Inter-process `mmap`.

### Trapping or non-trapping strategies.

Presently, when an instruction traps, the program is immediately terminated.
This suits C/C++ code, where trapping conditions indicate Undefined Behavior at
the source level, and it's also nice for handwritten code, where trapping
conditions typically indicate an instruction being asked to perform outside its
supported range. However, the current facilities do not cover some interesting
use cases:

* Not all likely-bug conditions are covered. For example, it would be very nice
  to have a signed-integer add which traps on overflow. Such a construct would
  add too much overhead on today's popular hardware architectures to be used in
  general, however it may still be useful in some contexts.
* Some higher-level languages define their own semantics for conditions like
  division by zero and so on. It's possible for compilers to add explicit checks
  and handle such cases manually, though more direct support from the platform
  could have advantages:
  * Non-trapping versions of some operators, such as an integer division
    instruction that returns zero instead of trapping on division by zero, could
    potentially run faster on some platforms.
  * The ability to recover gracefully from traps in some way could make many
    things possible. Possibly this could involve throwing or possibly by
    resuming execution at the trapping instruction with the execution state
    altered, if there can be a reasonable way to specify how that should work.

### Additional integer operators

* The following operators can be built from other operators already present,
  however in doing so they read at least one non-constant input multiple times,
  breaking single-use expression tree formation.
  * `i32.min_s`: signed minimum
  * `i32.max_s`: signed maximum
  * `i32.min_u`: unsigned minimum
  * `i32.max_u`: unsigned maximum
  * `i32.sext`: sign-agnostic `sext(x, y)` is `shr_s(shl(x,y),y)`
  * `i32.abs_s`: signed absolute value (traps on `INT32_MIN`)
  * `i32.bswap`: sign-agnostic reverse bytes (endian conversion)
  * `i32.bswap16`: sign-agnostic, `bswap16(x)` is `((x>>8)&255)|((x&255)<<8)`
  * `i64.min_s`: signed minimum
  * `i64.max_s`: signed maximum
  * `i64.min_u`: unsigned minimum
  * `i64.max_u`: unsigned maximum
  * `i64.sext`: sign-agnostic `sext(x, y)` is `shr_s(shl(x,y),y)`
  * `i64.abs_s`: signed absolute value (traps on `INT64_MIN`)
  * `i64.bswap`: sign-agnostic reverse bytes (endian conversion)

* The following operators are just potentially interesting.
  * `i32.clrs`: sign-agnostic count leading redundant sign bits (defined for
    all values, including 0)
  * `i32.floor_div_s`: signed division (result is [floored](https://en.wikipedia.org/wiki/Floor_and_ceiling_functions))
  * `i64.clrs`: sign-agnostic count leading redundant sign bits (defined for
    all values, including 0)
  * `i64.floor_div_s`: signed division (result is [floored](https://en.wikipedia.org/wiki/Floor_and_ceiling_functions))

* The following 64-bit-only operators are potentially interesting as well.
  * `i64.mor`: sign-agnostic [8x8 bit-matrix multiply with or](http://mmix.cs.hm.edu/doc/instructions-en.html#MOR)
  * `i64.mxor`: sign-agnostic [8x8 bit-matrix multiply with xor](http://mmix.cs.hm.edu/doc/instructions-en.html#MXOR)

### Additional floating point operators

  * `f32.minnum`: minimum; if exactly one operand is NaN, returns the other operand
  * `f32.maxnum`: maximum; if exactly one operand is NaN, returns the other operand
  * `f32.fma`: fused multiply-add (results always conforming to IEEE 754-2019)
  * `f64.minnum`: minimum; if exactly one operand is NaN, returns the other operand
  * `f64.maxnum`: maximum; if exactly one operand is NaN, returns the other operand
  * `f64.fma`: fused multiply-add (results always conforming to IEEE 754-2019)

`minnum` and `maxnum` operators would treat `-0.0` as being effectively less
than `0.0`. Also, it's advisable to follow the IEEE 754-2019, which has
removed IEEE 754-2008's `minNum` and `maxNum` (which return qNaN when either
operand is sNaN) and replaced them with `minimumNumber` and `maximumNumber`,
which prefer to return a number even when one operand is sNaN.

Note that some operators, like `fma`, may not be available or may not perform
well on all platforms. These should be guarded by
[feature tests](FeatureTest.md) so that if available, they behave consistently.

### Floating point approximation operators

  * `f32.reciprocal_approximation`: reciprocal approximation
  * `f64.reciprocal_approximation`: reciprocal approximation
  * `f32.reciprocal_sqrt_approximation`: reciprocal sqrt approximation
  * `f64.reciprocal_sqrt_approximation`: reciprocal sqrt approximation

These operators would not required to be fully precise, but the specifics
would need clarification.

### 16-bit and 128-bit floating point support

For 16-bit floating point support, it may make sense to split the feature
into two parts: support for just converting between 16-bit and 32-bit or
64-bit formats possibly folded into load and store operators, and full
support for actual 16-bit arithmetic.

128-bit is an interesting question because hardware support for it is very
rare, so it's usually going to be implemented with software emulation anyway,
so there's nothing preventing WebAssembly applications from linking to an
appropriate emulation library and getting similarly performant results.
Emulation libraries would have more flexibility to offer approximation
techniques such as double-double arithmetic. If we standardize 128-bit
floating point in WebAssembly, it will probably be standard IEEE 754-2019
quadruple precision.

### Full IEEE 754-2019 conformance

WebAssembly floating point conforms IEEE 754-2019 in most respects, but there
are a few areas that are
[not yet covered](Semantics.md#floating-point-operators).

To support exceptions and alternate rounding modes, one option is to define an
alternate form for each of `add`, `sub`, `mul`, `div`, `sqrt`, and `fma`. These
alternate forms would have extra operands for rounding mode, masked traps, and
old flags, and an extra result for a new flags value. These operators would be
fairly verbose, but it's expected that their use cases will be specialized. This
approach has the advantage of exposing no global (even if only per-thread)
control and status registers to applications, and to avoid giving the common
operators the possibility of having side effects.

Debugging techniques are also important, but they don't necessarily need to be
in the spec itself. Implementations are welcome (and encouraged) to support
non-standard execution modes, enabled only from developer tools, such as modes
with alternate rounding, or evaluation of floating point operators at greater
precision, to support [techniques for detecting numerical instability](https://www.cs.berkeley.edu/~wkahan/Mindless.pdf),
or modes using alternate NaN bitpattern rules, to carry diagnostic information
and help developers track down the sources of NaNs.

To help developers find the sources of floating point exceptions,
implementations may wish to provide a mode where NaN values are produced with
payloads containing identifiers helping programmers locate where the NaNs first
appeared. Another option would be to offer another non-standard execution mode,
enabled only from developer tools, that would enable traps on selected floating
point exceptions, however care should be taken, since not all floating point
exceptions indicate bugs.

### Flushing Subnormal Values to Zero

Many popular CPUs have significant stalls when processing subnormal values,
and support modes where subnormal values are flushed to zero which avoid
these stalls. And, ARMv7 NEON has no support for subnormal values and always
flushes them. A mode where floating point computations have subnormals flushed
to zero in WebAssembly would address these two issues.

### Integer Overflow Detection

There are two different use cases here, one where the application wishes to
handle overflow locally, and one where it doesn't.

When the application is prepared to handle overflow locally, it would be useful
to have arithmetic operators which can indicate when overflow occurred. An
example of this is the checked arithmetic builtins available in compilers such
as
[clang](https://clang.llvm.org/docs/LanguageExtensions.html#checked-arithmetic-builtins)
and
[GCC](https://gcc.gnu.org/onlinedocs/gcc/Integer-Overflow-Builtins.html).
If WebAssembly is made to support nodes with multiple return values, that could
be used instead of passing a pointer.

There are also several use cases where an application does not wish to handle
overflow locally. One family of examples includes implementing optimized bignum
arithmetic, or optimizing JavaScript Numbers to use int32 operators. Another family
includes compiling code that doesn't expect overflow to occur, but which wishes
to have overflow detected and reported if it does happen. These use cases would
ideally like to have overflow trap, and to allow them to
[handle trap specially][future trapping]. Following the rule that explicitly signed and
unsigned operators trap whenever the result value can not be represented in the
result type, it would be possible to add explicitly signed and unsigned versions
of integer `add`, `sub`, and `mul`, which would trap on overflow. The main
reason we haven't added these already is that they're not efficient for
general-purpose use on several of today's popular hardware architectures.

### Better feature testing support

The [MVP feature testing situation](FeatureTest.md) could be improved by
allowing unknown/unsupported instructions to decode and validate. The runtime
semantics of these unknown instructions could either be to trap or call a
same-signature module-defined polyfill function. This feature could provide a
lighter-weight alternative to load-time polyfilling (approach 2 in
[FeatureTest.md](FeatureTest.md)), especially if the [specific layer](BinaryEncoding.md)
were to be standardized and performed natively such that no user-space translation 
pass was otherwise necessary.

### Array globals 

If globals are allowed array types, significant portions of memory could be moved out of linear memory which could reduce fragmentation issues. Languages like Fortran which limit aliasing would be one use case. C/C++ compilers could also determine that some global variables never have their address taken.

### Multiple Tables and Memories

The MVP limits modules to at most one memory and at most one table (the default
ones) and there are only operators for accessing the default table and memory.

After the MVP and after [GC reference types][future garbage collection] have been added, the default
limitation can be relaxed so that any number of tables and memories could be
imported or internally defined and memories/tables could be passed around as
parameters, return values and locals. New variants of `load`, `store`
and `call_indirect` would then be added which took an additional memory/table
reference operand.

To access an imported or internally-defined non-default table or memory, a
new `address_of` operator could be added which, given an index immediate,
would return a first-class reference. Beyond tables and memories, this could
also be used for function definitions to get a reference to a function (which,
since opaque, could be implemented as a raw function pointer).

### More Table Operators and Types

In the MVP, WebAssembly has limited functionality for operating on 
[tables](Semantics.md#table) and the host-environment can do much more (e.g.,
see [JavaScript's `WebAssembly.Table` API](JS.md#webassemblytable-objects)).
It would be useful to be able to do everything from within WebAssembly so, e.g.,
it was possible to write a WebAssembly dynamic loader in WebAssembly. As a
prerequisite, WebAssembly would need first-class support for 
[GC references][future garbage collection] on the stack and in locals. Given that, the following
could be added:

* `get_table`/`set_table`: get or set the table element at a given dynamic
  index; the got/set value would have a GC reference type
* `grow_table`: grow the current table (up to the optional maximum), similar to
  `grow_memory`
* `current_table_length`: like `current_memory`.

Additionally, in the MVP, the only allowed element type of tables is a generic
"anyfunc" type which simply means the element can be called but there is no
static signature validation check. This could be improved by allowing:

* functions with a particular signature, allowing wasm generators to use
  multiple homogeneously-typed function tables (instead of a single
  heterogeneous function table) which eliminates the implied dynamic signature
  check of a call to a heterogeneous table;
* any other specific GC reference type, effectively allowing WebAssembly code
  to implement a variety of rooting API schemes.

## Proposals moved to tracking issues

These proposals have now moved to [tracking issues](#tracking-issues). The old
links are preserved here for backwards compatibility.

### GC/DOM Integration

See issue [1079][].

### Memset and Memcpy Operators

See issue [1114][].

### Tail Calls

See issue [1144][].

#### Signature-restricted Proper Tail Calls

See the [asm.js RFC][] for a full description of signature-restricted Proper
Tail Calls (PTC).

Useful properties of signature-restricted PTCs:

* In most cases, can be compiled to a single jump.
* Can express indirect `goto` via function-pointer calls.
* Can be used as a compile target for languages with unrestricted PTCs; the code
  generator can use a stack in the linear memory to effectively implement a custom call
  ABI on top of signature-restricted PTCs.
* An engine that wishes to perform aggressive optimization can fuse a graph of
  PTCs into a single function.
* To reduce compile time, a code generator can use PTCs to break up ultra-large
  functions into smaller functions at low overhead using PTCs.
* A compiler can exert some amount of control over register allocation via the
  ordering of arguments in the PTC signature.

  [asm.js RFC]: http://discourse.specifiction.org/t/request-for-comments-add-a-restricted-subset-of-proper-tail-calls-to-asm-js
 
#### General-purpose Proper Tail Calls

General-purpose Proper Tail Calls would have no signature restrictions, and
therefore be more broadly usable than
[Signature-restricted Proper Tail Calls](Semantics.md#signature-restricted-proper-tail-calls),
though there would be some different performance characteristics.


[future trapping]: FutureFeatures.md#trapping-or-non-trapping-strategies
[future garbage collection]: https://github.com/WebAssembly/proposals/issues/16