From 9329862be20ddb6da38e28519871910c2680af1d Mon Sep 17 00:00:00 2001 From: JF Bastien Date: Mon, 8 Jun 2015 17:38:49 +0200 Subject: Binary format. --- BinaryEncoding.md | 82 +++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 53 insertions(+), 29 deletions(-) (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index e605ea5..ad0c80c 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -1,37 +1,61 @@ # Binary Encoding -This document describes the binary encoding of the AST nodes defined in the [AST semantics](AstSemantics.md). -For the context of this document, see the [Binary format](MVP.md#binary-format) section of the [MVP doc](MVP.md). - -The binary encoding is designed to allow fast startup, which includes reducing download size -and allow for quick decoding. Considering the matter of reducing download size, we can see -it as having three layers: - - * The **raw** binary encoding itself, natively decoded by the browser, and to be standardized in - the MVP of the spec. - * **Specific** compression to the binary encoding, that is unreasonable to expect a generic - compression algorithm like gzip to achieve. - * This is not meant to be standardized, at least not initially, as it can be done with a - downloaded decompressor that runs as web content on the client, and in particular - can be implemented in a polyfill. Not standardizing it leaves the binary - encoding as the only thing a WebAssembly implementation is required to implement, - which is nice. However, if the benefits are shown to be substantial, this will be - reconsidered in the future. - * We can do better than generic compression because we are aware of the AST structure - and other details: - * For example, macro compression that [deduplicates AST trees](https://github.com/WebAssembly/spec/issues/58#issuecomment-101863032) - can focus on AST nodes + their children, thus having `O(nodes)` entities to worry about, compared - to generic compression which in principle would need to look at `O(bytes*bytes)` entities. - Such macros would allow the logical equivalent of `#define ADD1(x) (x+1)`, i.e., - to be parametrized. Simpler macros (`#define ADDX1 (x+1)`) can implement useful +This document describes the binary encoding of the +[Abstract Syntax Tree](AstSemantics.md) nodes. + +The binary encoding is designed to allow fast startup, which includes reducing +download size and allow for quick decoding. + +Reducing download size, is achieved through three layers: + + * The **raw** binary encoding itself, natively decoded by the browser, and to + be standardized in the [MVP](MVP.md). + * **Specific** compression to the binary encoding, that is unreasonable to + expect a generic compression algorithm like gzip to achieve. + * This is not meant to be standardized, at least not initially, as it can be + done with a downloaded decompressor that runs as web content on the client, + and in particular can be implemented in a [polyfill](Polyfill.md). + * We can do better than generic compression because we are aware of the AST + structure and other details: + * For example, macro compression that + [deduplicates AST trees](https://github.com/WebAssembly/spec/issues/58#issuecomment-101863032) + can focus on AST nodes + their children, thus having `O(nodes)` entities + to worry about, compared to generic compression which in principle would + need to look at `O(bytes*bytes)` entities. Such macros would allow the + logical equivalent of `#define ADD1(x) (x+1)`, i.e., to be + parametrized. Simpler macros (`#define ADDX1 (x+1)`) can implement useful features like constant pools. - * Another example is reordering of functions and some internal nodes, which we know - does not change semantics, but + * Another example is reordering of functions and some internal nodes, which + we know does not change semantics, but [can improve general compression](http://www.rfk.id.au/blog/entry/cromulate-improve-compressibility/). - * **Generic** compression, such as gzip, already supported in browsers, or LZMA and other - compression algorithms, which might be standardized as well. + * **Generic** compression, such as gzip, already supported in browsers, or LZMA + and other compression algorithms, which might be standardized as well. + +Each of the three layers work to find compression opportunities to the best of +its abilities, without encroaching upon the subsequent layer's compression +opportunities. + +## Why a binary encoding instead of a text-only representation -Each of the three layers work to find compression opportunities to the best of its abilities, without encroaching upon the subsequent layer's compression opportunities. +Given that text is so compressible and it is well known that it is hard to beat +gzipped source, is there any win from having a binary format over a text format? +Yes: +* Large reductions in payload size can still significantly decrease the + compressed file size. + * Experimental results from a + [polyfill prototype](https://github.com/WebAssembly/polyfill) show the + gzipped binary format to be about 20-30% smaller than the corresponding + gzipped asm.js. +* A binary format that represents the names of variables and functions with raw + indices instead of strings is much faster to decode: array indexing + vs. dictionary lookup. + * Experimental results from a + [polyfill prototype](https://github.com/WebAssembly/polyfill) show that + decoding the binary format is about 23× faster than parsing the + corresponding asm.js source (using + [this demo](https://github.com/lukewagner/AngryBotsPacked), comparing + *just* parsing in SpiderMonkey (no validation, IR generation) to *just* + decoding in the polyfill (no asm.js code generation). ## Building blocks -- cgit v1.2.3 From 8ab1c1eaaf25e0688d893a70708aba31188f76d1 Mon Sep 17 00:00:00 2001 From: JF Bastien Date: Mon, 8 Jun 2015 17:46:01 +0200 Subject: Split out text format into its own document. --- BinaryEncoding.md | 2 +- MVP.md | 21 +++------------------ TextFormat.md | 22 ++++++++++++++++++++++ 3 files changed, 26 insertions(+), 19 deletions(-) create mode 100644 TextFormat.md (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index ad0c80c..bb9d339 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -125,7 +125,7 @@ conflict-avoidance practices surrounding string names: that modules could optionally use as a starting point to further refine. * For example, to use all of [the MVP](MVP.md) plus [SIMD](EssentialPostMVPFeatures.md#fixed-width-simd) the declaration could be "base" followed by the list of SIMD opcodes used. - * This feature would also be most useful for people handwriting the [text format](MVP.md#text-format). + * This feature would also be most useful for people handwriting the [text format](TextFormat.md). * However, such a version declaration does not establish a global "version" for the module or affect anything outside of the initialization of the index spaces; decoders would remain versionless and simply add cases for new *names* (as with current JS parsers). diff --git a/MVP.md b/MVP.md index ec17703..e9971a0 100644 --- a/MVP.md +++ b/MVP.md @@ -74,24 +74,9 @@ separate docs with more precise descriptions of: both straightforward and causes no loss of information in either direction. ## Text format -* The purpose of this format is to support: - * View Source on a WebAssembly module, thus fitting into the Web (where every source can - be viewed) in a natural way. - * Presentation in browser devtools when source maps aren't present (which is necessarily the case with the MVP). - * Writing WebAssembly code directly for reasons including pedagogical, experimental, debugging, or - optimization. -* The text format is equivalent and isomorphic to the - [binary format](MVP.md#binary-format), see notes there. -* The text format will be standardized, but only for tooling purposes: - * Compilers will support this format for `.S` and inline assembly. - * Debuggers and profilers will present binary code using this textual format. - * Browsers will not parse the textual format on regular web content in order - to implement WebAssembly semantics. -* Given that the code representation is actually an AST, the syntax would contain nested - statements and expressions (instead of the linear list of instructions most assembly languages have). -* There is no requirement to use JS syntax; this format is not intended to be evaluated or translated - directly into JS. -* TODO: there is no real proposal yet + +The [text format](TextFormat.md) provides readability to developers, and is +isomorphic to the [binary format](BinaryEncoding.md). ## Heap * In the MVP, when a WebAssembly module is loaded, it creates a new heap. diff --git a/TextFormat.md b/TextFormat.md new file mode 100644 index 0000000..1ce2bb4 --- /dev/null +++ b/TextFormat.md @@ -0,0 +1,22 @@ +# Text Format + +The purpose of this text format is to support: +* View Source on a WebAssembly module, thus fitting into the Web (where every + source can be viewed) in a natural way. +* Presentation in browser development tools when source maps aren't present + (which is necessarily the case with the MVP). +* Writing WebAssembly code directly for reasons including pedagogical, + experimental, debugging, optimization, and testing of the spec itself. +* The text format is equivalent and isomorphic to the + [binary format](BinaryEncoding.md). +* The text format will be standardized, but only for tooling purposes: + * Compilers will support this format for `.S` and inline assembly. + * Debuggers and profilers will present binary code using this textual format. + * Browsers will not parse the textual format on regular web content in order + to implement WebAssembly semantics. +* Given that the code representation is actually an + [Abstract Syntax Tree](ASTSemantics.md), the syntax would contain nested + statements and expressions (instead of the linear list of instructions most + assembly languages have). +* There is no requirement to use JavaScript syntax; this format is not intended + to be evaluated or translated directly into JavaScript. -- cgit v1.2.3 From 01676f85cbb672aa1199e35e2a83a480aacff20e Mon Sep 17 00:00:00 2001 From: JF Bastien Date: Tue, 9 Jun 2015 11:27:11 +0200 Subject: Define portability Address issue #38. --- BinaryEncoding.md | 6 ++++-- HighLevelGoals.md | 7 ++++--- IncompletelySpecifiedBehavior.md | 5 ++++- Portability.md | 32 ++++++++++++++++++++++++++++++++ 4 files changed, 44 insertions(+), 6 deletions(-) create mode 100644 Portability.md (limited to 'BinaryEncoding.md') diff --git a/BinaryEncoding.md b/BinaryEncoding.md index e605ea5..589c1d3 100644 --- a/BinaryEncoding.md +++ b/BinaryEncoding.md @@ -1,7 +1,9 @@ # Binary Encoding -This document describes the binary encoding of the AST nodes defined in the [AST semantics](AstSemantics.md). -For the context of this document, see the [Binary format](MVP.md#binary-format) section of the [MVP doc](MVP.md). +This document describes the [portable](Portability.md) binary encoding of the +AST nodes defined in the [AST semantics](AstSemantics.md). For the context of +this document, see the [Binary format](MVP.md#binary-format) section of the +[MVP doc](MVP.md). The binary encoding is designed to allow fast startup, which includes reducing download size and allow for quick decoding. Considering the matter of reducing download size, we can see diff --git a/HighLevelGoals.md b/HighLevelGoals.md index 66a2bca..9afda01 100644 --- a/HighLevelGoals.md +++ b/HighLevelGoals.md @@ -1,8 +1,9 @@ # WebAssembly High-Level Goals -1. Define a portable, size- and load-time-efficient [binary format](MVP.md#binary-format) - to serve as a compilation target which can be compiled to execute at native - speed by taking advantage of common hardware capabilities. +1. Define a [portable](Portability.md), size- and load-time-efficient + [binary format](MVP.md#binary-format) to serve as a compilation target which + can be compiled to execute at native speed by taking advantage of common + hardware capabilities. 2. Specify and implement incrementally: * design [a Minimum Viable Product (MVP)](MVP.md) for the standard with roughly the same functionality as [asm.js](http://asmjs.org); diff --git a/IncompletelySpecifiedBehavior.md b/IncompletelySpecifiedBehavior.md index b3679d0..d525051 100644 --- a/IncompletelySpecifiedBehavior.md +++ b/IncompletelySpecifiedBehavior.md @@ -1,6 +1,9 @@ # Incompletely Specified Behavior -WebAssembly is a sandboxed platform. Applications can't access data outside the sandbox without going through appropriate APIs, or otherwise escape the sandbox, even if the behavior inside the sandbox should ever be unspecified in any way. +WebAssembly is a [portable](Portability.md) sandboxed platform. Applications +can't access data outside the sandbox without going through appropriate APIs, or +otherwise escape the sandbox, even if the behavior inside the sandbox should +ever be unspecified in any way. WebAssembly always maintains valid callstacks. Return addresses are stored on the trusted stack and can't be clobbered by the application. And, WebAssembly ensures that calls and branches always have valid destinations. diff --git a/Portability.md b/Portability.md new file mode 100644 index 0000000..37d7e41 --- /dev/null +++ b/Portability.md @@ -0,0 +1,32 @@ +# Portability + +WebAssembly's [binary format](BinaryEncoding.md) is designed to be executable +efficiently on a variety of operating systems and instruction set architectures, +[on the web](Web.md) and [off the web](NonWeb.md). The standardization process +will identify *tier 1* platforms on which WebAssembly is expected to be +implementable efficiently in a fully conformant manner, taking into account +[allowed implementation variants](IncompletelySpecifiedBehavior.md). + +WebAssembly portability assumes: + +* 8-bit bytes. +* Little-endian byte ordering. +* ILP32 data model (32-bit address space). 64-bit heaps may be added later, and + application would need to explicitly opt-in. +* Hardware and operating system with process isolation, memory protection, and + no-execute support for code pages. +* An execution environment which offers forward progress guarantees to all + threads of execution (even when executing in a non-parallel manner). +* When running in a browser, modern browser features are expected to be + available. + +In all of the above cases, an implementation may opt to emulate behavior that +the host hardware or operating system doesn't offer. + +Developer-exposed APIs (such as POSIX) are expected to be portable at a +source-code level through WebAssembly libraries which will use +[feature detection](FeatureTest.md). These libraries aren't necessarily +standardized: WebAssembly will follow the +[extensible web manifesto](https://extensiblewebmanifesto.org)'s lead and expose +low-level capabilities that expose the possibilities of the underlying platform +as closely as possible. -- cgit v1.2.3