From 0427c697dcd092cd16a9339290e51075aa68dd3e Mon Sep 17 00:00:00 2001 From: rossberg-chromium Date: Thu, 9 Mar 2017 07:32:31 +0100 Subject: Pointer to tentative official text format (#1012) --- TextFormat.md | 80 ++++++++++++++++++++++++++++++----------------------------- 1 file changed, 41 insertions(+), 39 deletions(-) (limited to 'TextFormat.md') diff --git a/TextFormat.md b/TextFormat.md index 5bb87c0..ff94d40 100644 --- a/TextFormat.md +++ b/TextFormat.md @@ -1,13 +1,46 @@ # Text Format -WebAssembly does not yet have a standardized text format that encodes function -bodies in addition to module structure, data segments and other program -metadata in a way that is eqivalent to the [binary format](BinaryEncoding.md). -WebAssembly does, however, have a specified -[textual representation](#linear-bytecode) of function bodies which should be displayed in browsers and other tools when [debugging](#debug-symbol-integration) +WebAssembly will define a standardized text format +that encodes a WebAssembly module with all its contained definitions +in a way that is equivalent to the [binary format](BinaryEncoding.md). +This format will use [S-expressions][] (avoiding syntax bikeshed discussions) to express modules and definitions while allowing a [linear representation](#linear-instructions) for the code in function bodies. +This format is understood by tools and used in browsers when [debugging](#debug-symbol-integration) modules. -# Linear bytecode + [S-expressions]: https://en.wikipedia.org/wiki/S-expression + +The format will be close to [this grammar][], +which provides a raw syntax in direct correspondence with the binary format +as well as some syntactic sugar on top. +Examples can be found in the WebAssembly [test suite][]. + + [this grammar]: https://github.com/WebAssembly/spec/tree/master/interpreter/#s-expression-syntax + [test suite]: https://github.com/WebAssembly/spec/tree/master/test/core/ + +The following tools currently understand this format: + +* [specification interpreter][] consumes an s-expression syntax. +* [wabt][] consumes compatible s-expressions. +* [binaryen][] can consume compatible s-expressions. +* [LLVM backend][] (the `CHECK:` parts of these tests) emits compatible s-expressions. +* [WAVM backend][] consumes compatible s-expressions. + + [specification interpreter]: https://github.com/WebAssembly/spec/tree/master/interpreter/ + [wabt]: https://github.com/WebAssembly/wabt + [binaryen]: https://github.com/WebAssembly/binaryen + [LLVM backend]: https://github.com/llvm-mirror/llvm/tree/master/test/CodeGen/WebAssembly + [WAVM backend]: https://github.com/AndrewScheidecker/WAVM/tree/master/Test + +The recommended file extension for WebAssembly code in textual format is `.wat`. + +**Note:** The `.wast` format understood by some of the listed tools is a superset of the `.wat` format that is intended for writing test scripts. +Besides the definition of modules such scripts can contain assertions and other commands as defined by a [grammar extension]. +These extensions are *not* part of the official text format, which may only contain a single module. + + [grammar extension]: https://github.com/WebAssembly/spec/tree/master/interpreter/#scripts + + +# Linear instructions WebAssembly function bodies encode bytecode instructions which have specified canonical opcode names. A linear presentation of these sequences of instructions allows a direct human-readable order-preserving presentation of the binary format. This format is suitable for opcode by opcode inspection of a WebAssembly program and can readily be related to the [semantics](Semantics.md) of the format. @@ -59,34 +92,6 @@ end -# Tool conventions - -Most WebAssembly tools currently use [s-expressions][] to represent modules in a -textual format. Although the s-expression format is not an official text format, -it is a tooling convention because it allows for the representation of function -signatures, declarations, and other metadata and it doesn't have much of a -syntax to speak of (avoiding syntax bikeshed discussions). - - [s-expressions]: https://en.wikipedia.org/wiki/S-expression - -Here are some of these prototypes. Keep in mind that these *aren't* official, -and the final standard format may look entirely different: - -* [Prototype specification][] consumes an s-expression syntax. -* [WAVM backend][] consumes compatible s-expressions. -* [wabt][] consumes compatible s-expressions. -* [LLVM backend][] (the `CHECK:` parts of these tests) emits compatible s-expressions. -* [ilwasm][] emits compatible s-expressions. -* [binaryen][] can consume compatible s-expressions. - - [prototype specification]: https://github.com/WebAssembly/spec/tree/master/interpreter/test - [LLVM backend]: https://github.com/llvm-mirror/llvm/tree/master/test/CodeGen/WebAssembly - [WAVM backend]: https://github.com/AndrewScheidecker/WAVM/tree/master/Test - [V8 prototype]: https://github.com/WebAssembly/v8-native-prototype - [ilwasm]: https://github.com/WebAssembly/ilwasm - [wabt]: https://github.com/WebAssembly/wabt - [binaryen]: https://github.com/WebAssembly/binaryen - ## Debug symbol integration The binary format inherently strips names from functions, locals, globals, etc, @@ -96,10 +101,9 @@ story, a lightweight, optional "debug symbol" global section may be defined which associates names with each indexed entity and, when present, these names will be used in the text format projected from a binary WebAssembly module. -# Future design -## :unicorn: +# Design considerations -An official text format for WebAssembly needs to +The text format for WebAssembly needs to be able to represent any well-structured module unambiguously. In addition to function bodies and their instruction sequences, it also needs a way of encoding declarations, function @@ -120,8 +124,6 @@ cases: * Presentation in browser development tools when source maps aren't present (which is necessarily the case with [the Minimum Viable Product (MVP)](MVP.md)). * Writing WebAssembly code directly for reasons including pedagogical, experimental, debugging, optimization, and testing of the spec itself. -## Additional design considerations - There is no requirement to use JavaScript syntax; this format is not intended to be evaluated or translated directly into JavaScript. There may also be substantive reasons to use notation that is different than JavaScript (for example, WebAssembly has a 64-bit integer type, and it should be represented in the text format, since that is the natural thing to do for WebAssembly, regardless of JavaScript not having such a type). On the other hand, when there are no substantive reasons and the options are basically bikeshedding, then it does make sense for the text format to match existing conventions on the Web (for example, curly braces, as in JavaScript and CSS). The text format may not be uniquely representable. Multiple textual files will likely assemble to the same binary file. For example, whitespace shouldn't be significant and memory initialization can be broken out into smaller pieces in a text format. -- cgit v1.2.3