Understanding WASM
Part 1: Virtualization
So, this all started when I tried to answer a colleague's question about "what makes WebAssembly (WASM) different from the Java Virtual Machine (JVM)?" They were particularly interested in parsing out WASM's goals as they overlapped and differed from the historical goals of the JVM ("write once, run anywhere"). Further, they were curious about the differences in approach that lead folks to believe WASM might succeed at achieving these goals. You might say:
- WASM is a Web platform technology, developed and supported in the open by multiple vendors. The JVM was primarily driven by a single vendor.
This happens to be perfectly correct! But still. There was something there that itched at the back of my mind: I knew these things were different, but I wasn't able to articulate my mental framework to explain those differences. There's a lot going on in that pat answer that bears unpacking:
- Where did WASM come from -- what problems was it created to solve?
- How did WASM come to be developed and supported by multiple vendors, while the JVM remained single-vendor?
- Are WASM and the JVM really the same type of thing?
- What does "virtual machine" mean?
In particular, when I went to talk about WASM -- which bills itself as a "virtual instruction set architecture" 1 in specification -- I found that I had a shallow understanding of the word "virtual". I knew the term for its effect, by example, but not deeply. So. Having found myself with some time on my hands recently, I split this post in three (at least, to start with.) Today, we're going to talk about virtualization, instruction set architectures, and machines. This post isn't meant to get into all of the details, but should give you a good mental framework with which to place these concepts, and serve as a nice jumping-off point for further research.
Let's start by forming some rough categories using examples.
There's the "virtual machines" that Amazon or Google are happy to sell us: virtualized computers running on their hardware in the cloud. Alternatively: VMWare, VirtualBox, and QEMU let me run virtualized computers on my hardware. So, that's one definition of "virtual machine": "software simulating hardware." (You might even lump "video game emulators" into this category!)
On the other hand, we have "virtual machines" in the "JVM" sense: things like the .NET Common Language Runtime ("CLR"), Erlang's BEAM, JavaScript's SpiderMonkey or V8 engines, or even Python's interpreter. These are all programs running on an operating system for the purposes of taking an input language and executing it. They do this by translating the input language into the input language of the current computer. The Python virtual machine consumes a Python operation code (opcode) and reacts to it by executing some number of native machine operations. Let's call this "software evaluating a language" for now.
On the gripping hand: the Low Level Virtual Machine. I'm calling this one out separately because it's not really "software simulating hardware", nor is it "software evaluating a language": it's something else entirely. It's a waystation for languages on their way toward native machine code. C, C++, Rust, and other languages can be translated from their source to this intermediate representation, on which a number of optimizations can be made by the LLVM toolchain. LLVM is a "machine which abstracts over other machines", an "abstract machine."
There are three things that these three rough piles have in common:
- They all accept some sort of input language.
- They all refer to the term "virtual."
- They all refer to the term "machine."
Let's approach these concepts in reverse order.
You might already be familiar with the term "Turing machine" or the phrase "Turing complete", but in case you're not: a Turing machine is a mathematical model of computation for which some useful properties hold. It describes an abstract machine -- hey, there's that term again! -- that operates on a finite set of symbols printed on an infinite spool of tape. The classic Turing machine uses a "head" to write to a cell to a discrete cell on that tape, shifting the tape left or right based on its current state. Turing completeness describes a property of a language, expressed as a discrete syntax: a Turing complete language can express any task that can be accomplished by computers.
There's an interesting relationship between "input language" and "machine". A sufficiently detailed description of a language also describes a machine for processing that language, at least in the abstract. Much like how an IKEA instruction manual describes the ideal Billy bookshelf. Many Billy bookshelves can be assembled based on one instruction manual; all in various degrees of "conversation" with the physical constraints encountered during assembly.
So what does it mean to "virtualize" a machine?
A good way to distinguish virtual and transparent is this: Virtual means it’s not there but you can see it; transparent means that it is there but you can’t see it.
Peter J. Denning - IEEE Annals of Computing History
"Virtual" comes to computer science by way of the field of optics, where it describes the perceived image of an object reflected by a mirror. There are two frames of reference: an observer and the mirror. The observer sees a virtual world behind the plane of the mirror. The mirror only "sees" the world in front of it.
So, abstract machines are the platonic idea of a machine capable of processing a given language. Virtual machines are one of two ways to realize these machines: by writing them as programs to be executed by another machine. A virtual machine acts like a mirror for the input language: the language "sees" a machine designed just for it, but cannot see any of the implementation details. For all the input language knows, it might be running directly on hardware!
(Our vocabulary gets a little muddled in practice: the "AM" in Erlang's "BEAM" runtime stands for "abstract machine" 2, while the "VM" in both the JVM spec and in LLVM's name stand for "virtual machine." In other words, the boundary between "abstract" and "virtual" is a little blurry. Note also: specific aspects of a machine might be virtualized while others are left transparent. For example, most programs assume a single contiguous view of memory, which may exceed the actual memory available to the system. The operating system virtualizes this view for processes, working behind the scenes to maintain the illusion.)
The other way to realize an abstract machine is to reify it in hardware: to
build a processor to carry out the physical task of computing. A processor
doesn't have to be electronic, that just happens to be the most common and
efficient computing substrate available to us today. You could, in theory, act
as a processor with a pencil, eraser, and a notebook if you wanted to --
assuming you followed some strict set of rules that described what you do
with those items. That is, a processor still requires an input language with
rules. This input language takes the form of a contract between processor and
program: an instruction set architecture ("ISA".) Examples of ISAs include
x86_64
and arm64
-- it's frequently abbreviated as arch
, or
"architecture" 3.
The ISA acts as a "narrow waist" (as in an hourglass): it contains just
enough specification that many approaches to implementation are valid on both
sides of the contract. Software may expect that ADD
or JNE
exhibits the
expected behavior across implementations, while hardware is free to implement
that behavior as it sees fit. (AMD and Intel both implement the x86_64
ISA!) This is all, of course, subject to physical constraints -- speed, power
draw, and thermal envelopes all come into play when implementing an ISA 4.
So, while it's possible to define an ISA that cannot be efficiently implemented
as hardware, people generally don't do so on purpose.
Now, "Air Bud" 5 rules apply here: software running against an ISA doesn't know what implements that ISA. It's perfectly possible to virtualize an ISA such that the hardware is simulated by software 6. Software running on virtualized hardware doesn't know the hardware has been virtualized: it can't see the mirror. This is how we get those shiny cloud VMs, VirtualBox, VMWare, and QEMU "virtual machines"! Because we don't make any promises about the implementation, the virtualizer might take a number of different approaches. It might:
- interpret machine code as bytecode in software ("emulation"),
- pipe it directly through to its underlying ISA if those two happen to match ("hardware virtualization"),
- or recompile the ISA at runtime if it doesn't ("dynamic binary translation" or "just in time" / "JIT" compilation)
Emulation is the most resource-intensive approach 7, but is the
most flexible, theoretically: you can emulate a powerpc
architecture on top
of your x86_64
processor in software. This comes at a cost! Emulation
introduces a relatively low performance ceiling compared to the hardware running
the emulator.
Hardware virtualization is less flexible but more efficient: these are processor instructions that aid in the creation of hypervisor software. Hypervisors typically 8 run in a lower protection mode than operating systems; they can be thought of as an operating system for operating systems. Hypervisor software provides (near-)transparent processor access for virtualized operating systems, while virtualizing other system resources (like persistent disks, memory, and peripherals.) On x86 platforms, this is provided by AMD's AMD-V (AKA "Secure Virtual Machine" or "SVM") and Intel's VT-x (AKA "VMX"), ARM has an equivalent. AWS, GCP, VMWare, the Windows Subsystem for Linux 9, Apple's Hypervisor framework: all of these use hardware virtualization. As mentioned, this is generally less flexible than other ISA virtualization strategies, in that the ISA of the system virtualized must match the host ISA10 -- you can't virtualize an ARM machine on top of a physical x86 machine.
That is, unless you're running a dynamic binary translator ("DBT") of some sort. The most widely-deployed DBT ISA VM that I'm aware of is Apple's Rosetta. Rosetta 1 11 eased the transition from PowerPC architecture to Intel/x86 before its removal in OSX Lion, and Rosetta 2 is doing the same for the Intel/x86 transition to Apple's own ARM-architecture hardware. Notable mentions include Intel's IA-32 Execution Layer, the Dolphin emulator's JIT recompilation of Gamecube/Wii PowerPC code to x86/ARM 12, and, of course, the Transmeta Crusoe. There are caveats with this approach: some virtual instructions may have no hardware equivalent, and so must use (much slower!) emulation in software. Most importantly, this is the most complex solution to program: it requires a working emulator to fall back on for virtualized operations that have no native equivalent.
DBTs are probably the most expensive virtualization technology to develop -- they are JIT VMs, which we'll talk about again soon. Companies turn to them when both the language to be virtualized and the target platform are fixed points: IBM had to support Power architecture, e.g. However, because the addressable base of x86 machines is much larger, individual developers weren't likely to compile their software specially for IBM -- there's little incentive for them to do so.
Whew, okay, let's take a breath and recap briefly. A "machine" is any Turing-capable computer. An "abstract machine" is the conceptual definition of such a machine over an input language. An instruction set architecture is a contract language between the hardware of a machine and the input language of that machine -- software. To "virtualize" a system means to present an interface to that system "as if" it existed using some mechanism. The more transparent that mechanism is, the less performance will be lost in virtualizing: that is, hardware virtualization support makes the processor partially "transparent" to guest operating systems. On the opposite side of the gradient: software emulation is fully virtual, straightforward to implement, but slow compared to native execution. In the middle we have Dynamic Binary Translation: translating a binary in one format "just in time" to execute it as a native format. DBT JITs are expensive to write but potentially very effective!
Now, finally, input languages. Let's revisit that "language virtual machine" category of virtual machines we piled up earlier.
As you might suspect, hardware only gets one input language -- an ISA 13. However, approximately no one wants to write all of their software directly in that one input language. For one, it's a pain, but more importantly, it's intimately tied to the hardware. If, say, you have to support another ISA, that means a ground-up rewrite (or hinging your software's value proposition squarely in the hands of a system virtual machine vendor. Is your software really worth making your users simulate an entire other machine and operating system just to run it? For my part, no 14.) Leaving that aside, domain specific languages are powerful for a reason: they help us model the problem space of our software more directly.
So, we invented programming languages 15. Each programming language defines an abstract machine whether explicitly as part of a specification, or implicitly as part of an implementation. That abstract machine can be implemented in (roughly) one of three ways.
INCOMING FOOLISH CONSISTENCY. BRACE FOR IMPACT.
These three methods correspond to what we've already talked about above: we can emulate the language abstract machine in software, we can attempt to translate it to native code at runtime, or we can map the input language transparently to the underlying hardware. In other words: we can write an interpreter (like Ruby, Python; an "emulator"), a JIT VM (like the JVM or popular JavaScript engines; a "dynamic binary translator" of sorts), or an ahead-of-time ("AOT") compiler (as we have for C, C++, Rust, and others; "hardware virtualization," after a fashion, possible because the two machines are so similar.) Interpreters are generally slow but portable and flexible, JIT VMs are complex, potentially efficient, and generally require an interpreter to fall back to, and compilers produce efficient translations at the cost of flexibility.
They are all ways to virtualize a machine.
If you're like me, you're probably accustomed to thinking of Erlang, C#, or JavaScript as running atop a virtual machine. You might even think "yeah, Python's interpreter constitutes a virtual machine.16" But Go? C? C++? Rust? Compiled languages run on the machine, not on a virtual machine, right?
Well, sorta. C's specification, in fact, specifies an abstract machine17. It is the job of compiler software to statically map operations in that abstract machine to the corresponding operations in the hardware -- the machine as defined by the ISA, in other words. Rust, Go, Java, and C# work similarly, in this fashion: a compiler rephrases the input language into a form that can be easily translated to a target machine ("lowering" it). (In C# and Java's cases, the target machine happens to be a virtual machine: the CLR18 in the former case, and the JVM in the latter.)
This is where that last category of examples comes in: LLVM and friends. These "VMs" are intermediate languages (or "intermediate representations", or "IR"), meant to facilitate the translation of one abstract language machine into another. Rust itself has two (HIR and MIR), many JS engines have their own internal IRs, and the JVM bytecode specification is itself an IR. IRs narrow semantics, provide a place to collect extra information about the program, and act as a single point at which to apply useful optimizations 19. AOT compiled languages, like C, Go, or Rust, contain sufficient information to be "lowered" to low level intermediate languages, which themselves can be compiled to an ISA after optimizations have been applied. Interpreted languages can be JIT compiled, assuming the runtime can collect sufficient information to generate native code 20.
I'm going to turn now to a paper from 1978 that I definitely did not read ahead of drafting this post:
Thus a language definition implies a model of the machine on which programs in the language will run. If a real machine conforms well to the model, then an implementation on that machine is likely to be efficient and easily written; if not, the implementation will be painful to provide and costly to use.
Via "Portability of C Programs and the Unix System", S.C. Johnson and D.M. Ritchie
I urge you to read this paper in full. When I read it, I spun around in my office chair, arms raised in celebration for a full minute.
Ahem.
It's a story about porting C & UNIX from the PDP-11 to the Interdata 8/32. I'll
briefly summarize it here: it includes early automated regression testing, the
first tool called "lint", the origin of several concepts in C (including
unsigned integers, before which folks just had to type pun char*
pointers
because they happened to behave the same way on the PDP-11), the above quote,
early virtualization, a description of using an intermediate language in a
compiler toolchain to port Fortran to new machines in addition to C, and a
whole lot of new working systems spun up from existing, simpler working
systems. It is just so good.
But the bit of it that is germane to this post is this: C started with an implicit abstract machine, that mapped nearly 1:1 with the PDP-11. They specifically picked the Interdata 8/32 because it was so different from the PDP-11. In porting C, they modified the definition of C, using the large corpus of existing software to validate the port and their changes. They built an intermediate representation (though they didn't call it that) and an abstract model of the machine that the C language represented by extracting them from existing, working systems.
It wasn't just luck that C's machine model mapped so closely to the machines they were trying to straddle: the authors explicitly call out that they had previous experience porting languages between machines with C's predecessor, B, and the large software project, Altran (a Fortran library.) In other words, they made good bets and were willing to coax the language the rest of the way. And in shaving off those rough edges, they made an eventual port to a third machine that much easier: C's abstract machine became a self-reinforcing system.
It is true that moving a program from one machine to another does not guarantee that it can be moved to a third. There are many issues in portability about which we worried in a theoretical way without having to face them in fact. [...] Nevertheless, moving UNIX to a third new machine, or a fourth, will be easier than it was to the second. The operating system and the software have been carefully parameterized, and this will not have to be done again. We have also learned a great deal about the critical issues (the "hard parts").
That is: because C was so portable and had so much software already available for it, ISAs were incentivized to map closely to C's abstract machine. Because hardware based on those ISAs kept advancing, there was more incentive to write C (or languages that targeted C's abstract machine.) ISA's and C's abstract machine evolved symbiotically.
And that draws me to the close of this post. We spent a lot of time defining terms here: what a machine is, what it means to virtualize it, what an instruction set architecture is, the differences and similarities between virtualizing a system versus a language, implementation approaches and their costs, and finally how to use a set of existing machines to create and refine an abstract machine. WebAssembly, a self-billed "virtual ISA", is following in C's footsteps here: extracting a reusable abstract machine from a set of existing virtual machines. Next time, we'll talk about the context WASM was developed in and how it aims to achieve Java's goal of "Write once, run anywhere."
(Thanks to Eric Sampson for reviewing drafts of this blog post.)
See "scope" under the w3 WASM specification.
BEAM was a replacement for the original Erlang Abstract Machine: "JAM", or "Joe's Abstract Machine".
You might have heard of "complete instruction set computers" ("CISC") and "reduced instruction set computers" before ("RISC"): this refers to the approach taken in defining the ISA. Reduced instruction set computers generally specify fewer, more popular instruction codes in a more regular fashion. For example, as opposed to a CISC architecture, a RISC machine may not be able to combine memory access and computation in a single instruction. The idea is that the reduced set of instructions narrows the focus of the implementation, potentially making it faster. These days, ARM is considered a RISC machine, while x86 is considered a CISC machine. In practice, ISAs implementers may decompose operations into microcode invisible to software, so the line is a little blurry. For an interesting read on the subject from one of the folks behind the Transmeta Crusoe, see this paper.
You might have seen "arch" in various software releases
before, in the form x86_64-pc-win32
or arm-apple-darwin-eabi
. This is
known as a "target triple" and is meant to communicate the
target processor, hardware vendor, and operating system.
"There's no rule that says dogs can't play basketball!"
Or indeed, using hardware to virtualize other hardware: these days the x86 ISA is implemented as a frontend for other, vendor-specific processor microcode. See "C Is Not A Low Level Language".
Analogue's Super nt famously turned to Field Programmable Gate Arrays (FPGAs) in order to fully replicate the behavior of the Super Nintendo Entertainment System. There was a video that interviewed the person who implemented all of that FPGA work, but it is sadly lost to me. Happily, you can find out more about the technical capabilities over on the Digital Foundry video.
Type 1 hypervisors ("native") run on the hardware beneath all other operating systems: Oracle VM, VMWare's ESXi, Microsoft's Hyper-V, and Amazon's AWS Nitro. Type 2 ("hosted") hypervisors run as a process on top of an operating system. Type 2 includes VMWare, Virtualbox, and Virtual PC. It's not clear to me at the time of writing whether Apple's Hypervisor.framework is a Type 1 or Type 2 hypervisor. I get the sense that user-oriented operating systems are beginning to embed type 1 hypervisors (e.g., Microsoft supports WSL on Windows using a subset of Hyper-V), exposing the functionality out to the main guest OS as a framework.
WSL2 uses hardware virtualization. WSL1 was implemented entirely differently, and we'll talk more about that approach in a coming post.
For a really fun example of the inflexibility of hardware virtualization, see the Wii U's virtual Wii mode. It works because the Wii U is (roughly speaking) three Wii processors in a trenchcoat; so to become a Wii, it hides two of those processors and a bunch of RAM.
Rosetta 1, at least, is an implementation of Transitive Corporation's QuickTransit software, which Apple licensed. IBM acquired the company in 2009.
Seriously. The Dolphin dev blog is a treasure trove of interesting posts about various ISA JITs. See this one on adding an M1 backend to their JIT!
Yes, yes, even if the hardware is the Transmeta Crusoe. It has one input language, which happens to be a "very large instruction word" ("VLIW") set that can handle the union of a couple of other hardware ISAs. If you're interested in this processor, read more about it here.
Video games being the BIG exception to this. And even then, that took some pretty anticompetitive practices to put in place.
I'm being exceptionally glib about this. The history of programming languages is large and fascinating. It has a lot to do with linguistics but very little to do with the point of this (increasingly long) post.
See this wiki page for an overview of "ways to implement a language virtual machine." Also, go read "Crafting Interpreters"!
The "CLR" referred to here is Microsoft's "Common Language Runtime", which implements the "Common Language Infrastructure" ("CLI") abstract machine. "CLI" is miserable to search for, though, so I'm committing the sin of metonomy by referring to the abstract machine language by its implementation.
I'm probably so out of date at this point, but see Sea of Nodes, "Static Single Assignment" ("SSA"), "Loop Invariant Code Motion" ("LICM"), inline expansion, and the "optimizing compiler" page on wikipedia. It's a very interesting space!
I'm leaving a lot out here, but to gesture at it: at least for JavaScript, the job of a JIT VM is to run the language in multiple stages. In the initial stage, it's running as an interpreter, collecting type signatures; if a function is called frequently enough, with consistent enough type information, the interpreter code replaces that function with one in the native machine's ISA. This compiled code includes type checks that can cause the function to "bail out" if unexpected types are encountered at runtime, falling back to the interpreter. As I understand it, the JVM and CLR operate in a similar fashion, though with more initial type information available. The creation of JIT VMs owes a lot to the Smalltalk & Self lineage of languages; Strongtalk, in particular, informed the design of many vms.
Hardware virtualization involves the use of a program called a "control program", "hypervisor", or "virtual machine manager" ("VMM"), usually running in a lower protection mode than the kernel itself.
An aside here. I've just described two more subtypes of VMs here: process VMs and system VMs. System VMs run outside of the context of an operating system: a hypervisor, loaded as part of the hardware boot process. Process VMs, however, include QEMU, VMWare, Python, V8, SpiderMonkey: any VM that runs as a process managed by an operating system.
See "Abstract Machines" on Wikipedia. Also, go read "Abstract Machine Models".
Languages without specifications still have abstract machines, but they're not written down anywhere: they live in the heads of the maintainers of the canonical implementation. The Ruby and Python machines are "that feeling you get when you read the entire codebase", the wind in your hair on a pleasant Spring day. Of course, that means accurate, alternate implementations are that much more difficult to write.
See the freely available copy of the C standard, §5.1.2.3, which states: "The semantic descriptions in this International Standard describe the behavior of an abstract machine in which issues of optimization are irrelevant."