The State of Concurrent JavaScript

Much has been said about the ever-growing ubiquity of JavaScript, a corollary of which, in my opinion, is the increasing complexity of the applications that we choose to build with JavaScript. High-performance web servers, compilers, game engines, and so much more. JavaScript itself has changed a lot over the past several years in ways that help accommodate the difficulty of building such applications, and with the highly anticipated standardization of the ECMAScript 6 and 7 specifications, I suspect most JavaScript developers are well aware that we’re still in the midst of such change.

Concurrency, a change I expect we’ll be hearing a lot about in the coming months, is particularly interesting in the context of JavaScript. Despite the well-known fact that JavaScript executes in a single thread, concurrency is already a major piece of modern JavaScript; the Web Workers API offers a mechanism for concurrent execution in the browser, Node.js ships builtin features for spawning new processes, including the popular cluster module, and Node.js itself is built on a thread pool which backs much of the asynchronous operations the Node.js API offers, and which can be used for concurrent execution of JavaScript, as we'll see.

In this article, I’ll discuss these facets of concurrent JavaScript and explain some of their constraints and limitations. I’ll introduce an experimental project, that I’ve written to address some of these limitations, and with which I hope to demonstrate why concurrent JavaScript execution is important. Finally, I’ll conclude with a brief look at what we can expect in the future for concurrency in JavaScript.

I should first clarify my terminology and my motivation, to help appropriately set the context for this discussion. My motivation here is largely driven by the landscape of JavaScript tooling, particularly front end build tooling. Webpack, Babel, and Browserify all serve as prime examples, because they each support syntax transformation, yet none of them support parallel syntax transformation – something which seems a simple opportunity for major performance gains in large projects. The task of applying a syntax transformation is strictly CPU-bound; performing a transformation is a tree traversal on the Abstract Syntax Tree (AST) of the program, with in-place modifications. Now, concurrency and parallelism, two terms which already share a strong relationship, become even more similar when the work to be done is strictly CPU-bound. For the duration of this article, I’ll assume we’re working with multi-core systems, thus a set of threads in a given process each responsible for performing a sufficiently complex CPU-bound task with little coordinated shared state will most likely be distributed across the available cores by the operating system scheduler, thereby enabling the process to execute the work in parallel. Hence, my use of this terminology in this article, with "concurrency" and "parallelism" used somewhat interchangeably, should be regarded in the context of completing such CPU-bound computations, where concurrency yields parallelism.

Web Workers API

var worker = new Worker('worker.js');  
var results = [];

worker.postMessage({task: 'fib(35)'});  
worker.onmessage = function(e) {  
  console.log('Message received from worker.');
  results.push(e.data);
};

For JavaScript targeted at web browsers, HTML5 Web Workers1 are, as far as I know, the only option for offloading work from the main thread into a concurrent execution environment. From the brief example above you can get a quick idea of their functionality; a Worker can be created by pointing it to a script file to be run in the Worker’s isolated context, then communicated with via a message-passing interface.

In the latest W3C Web Workers Editor’s Draft,2 there are a few interesting stipulations that help explain the design decisions in the Web Workers API, and begin to reveal the constraints of concurrent JavaScript. First, a Web Worker must execute in a “separate parallel execution environment,” and within that environment, a new global context is created which shares no references to the main thread’s context. A separate execution environment may exist either in another process, or in a separate thread of the same process. This strict isolation helps to explain the very carefully controlled message-passing interface that enables the main thread to communicate with workers (and vice versa): because the separate execution environments cannot share references, in order to pass a message, the message must be serialized and then copied to the target environment. In cases where inter-process communication (IPC) is involved, this typically requires serialization to string or to a binary format which can be written to a file. However, in the case of multi-threaded implementations, many browsers implement a structured cloning algorithm to copy the message into the receiving thread’s context.3

These conditions, which enforce similar constraints in the concurrent execution model in Node.js as well, piqued my interest. Why should the Web Workers spec enforce such strict isolation if it guarantees serialization overhead? Working in traditional parallel execution environments, you would quickly find that serialization tends to be a large performance bottleneck. Thus the promise of shared references in threaded alternatives would pose a big potential performance gain. It turns out that the ECMAScript 5 specification governs that “[a]t any point in time, there is at most one execution context that is actually executing code.”4 To my eye, it seems that this specification actually forbids concurrent threads of execution within the same engine, which seems to ring true with the popular knowledge that JavaScript is single-threaded. Thus, the conditions present in the Web Workers specification make plenty of sense; foregoing coming changes in future ECMAScript versions, concurrent JavaScript execution requires multiple isolated execution environments, and serialized message passing is a price we pay. Indeed, this truth is one we see reflected in Node.js’ cluster module.

Node.js Cluster Module

var cluster = require('cluster');  
var http = require('http');  
var cpus = require('os').cpus().length;

function fib(n) {  
  return n < 2 ? 1 : fib(n - 2) + fib(n - 1);
}

if (cluster.isMaster) {  
  for (var i = 0; i < cpus; i++) {
    cluster.fork();
  }
} else {
  http.createServer(function(req, res) {
    res.writeHead(200);
    res.end(fib(35));
  }).listen(8000);
}

The Node.js cluster module allows you to parallelize a given path of execution by essentially replicating a running process such that each fork (worker process) shares a set of server ports.

The above example shows a cluster designed to compute the 35th number in the Fibonnaci sequence – a computation which takes a non-trivial amount of time – for each incoming request. Running this example, we have one worker process per available CPU, each sharing port 8000 on the local host. As requests come in, the master process ensures that the task of handling the request is fairly distributed among the worker processes, while the operating system scheduler (likely) ensures the processes, now completing CPU-bound tasks, are distributed among the available CPU cores.

As it turns out, the shared port handling is simply a clever feature built upon the default IPC mechanism from the master process to the worker processes (and vice versa), which, unsurprisingly, has a message-passing interface extremely similar to that of the Web Workers API.5 Thus it suddenly becomes clear that this cluster module is almost exactly an implementation of the Web Workers paradigm: each worker executes in a separate parallel execution environment, in this case a new process, with it’s own isolated global context, and a message-passing construct which forces serialization and deserialization for carrying the message over Node.js’ IPC mechanism – constraints we now know to expect given the current state of the ECMAScript specification.

The cluster module does distinguish itself from the Web Workers API via the shared server port handling feature. Unfortunately, in the context of distributing computational tasks locally, as in the case of implementing parallel syntax transformations for front end build tools, a local cluster imposes the performance overhead of the TCP handshake for each task, as well as the I/O overhead of Node.js’ default IPC. In a different context, perhaps distributing giant matrix multiplication across the available cores in each of 30 dedicated research machines, this kind of overhead is well worth the parallelization factor. However, my motivation is smaller, and thus, understanding this complication, I couldn’t help but dig a little deeper in search of a solution which minimizes these costs with an experimental Node.js add-on I’m calling Hive.

Hive

var Hive = require('hive');

function fib(n) {  
  return n < 2 ? 1 : fib(n - 2) + fib(n - 1);
}

Hive.init(fib.toString());  
Hive.eval('fib(35);', function(err, res) {  
  console.log(res); // 14930352
});

In truth, Hive6 is extremely similar to the Web Workers model as well. In order to illuminate the differences, let me first explain a little bit about Node.js under the hood.

Conceptually, Node.js is very simple; it’s a C++ program which embeds Google’s v8 JavaScript Engine and provides an auxiliary layer of functionality. Much of this functionality comes in the way of the included Node.js APIs: the fs module, the http module, etc. – tools that make JavaScript useful outside of the browser. Another big piece is a project called libuv,7 which provides nice abstractions for thread pool management and work scheduling, event loops, and helper utilities. Node.js leverages libuv’s default event loop, a piece of infrastructure probably familiar to most JavaScript developers who work in the browser, but one which is not actually a piece of JavaScript engines themselves. Event loops in the browser have more to do with Web APIs than they do with JavaScript execution itself. Node.js also uses a single libuv thread pool, containing four threads by default, to handle the blocking I/O involved in many of the default Node.js APIs asynchronously. For example, fs.readFile8 is a standard Node.js API for reading a file, which takes three arguments: a file path, an options object, and a callback. This function queues work on the libuv thread pool and then returns the flow of execution to your program. One of the worker threads will pick up the work, read the contents of the file into memory, and when finished, queue your callback in the event loop to be called with the file contents when your program is ready.

With those details explained, Hive is actually rather simple as well. Recall the Web Workers specification which requires that each worker operate in a separate parallel execution environment. Hive offers exactly that by creating a pool of v8 Isolate objects, which can be thought of as JavaScript engine instances, and v8 Context objects, the global contexts associated with each Isolate, and assigns a pair of Isolate/Context to each thread in the default libuv thread pool, whereas typically only the main thread maintains a single Isolate for running your program. Then, because Hive is intended for distributing CPU-bound tasks, only one function is exposed, accepting a string to be evaluated in one of the worker threads, such as ”fib(35);”, and a callback to be called with the result of the evaluation.

My goal in working with Hive was to improve upon the performance costs presented in the previous section. Recall that in the context of distributing CPU-bound work locally, the Node.js cluster module solution imposes the performance overhead of TCP handshakes and the I/O overhead of Node.js’ default IPC.

As we established earlier, message passing is an unavoidable constraint because of the strict isolation of JavaScript execution environments, and Hive is no exception to this rule. Hive uses JSON.stringify to ensure that the result of any given evaluation is of type string, which is the same serialization method used in Node.js' default IPC mechanism. However, Hive achieves substantial performance gains by way of copying the character array representing the result of the serialization step directly into the target isolate, and from the copied character array, constructs a new string handle in the target isolate. This approach both avoids the I/O overhead of IPC, and eliminates TCP costs due to the simple fact that there is no client/server communication involved.

Now that you understand Hive, let me share why I think it is useful. Surely, it is just an experiment – a product of satisfying my own curiosity – and thus I don’t necessarily intend for it to be used in production. However, by my benchmarks, Hive outperforms Node.js’ cluster module in the context of distributing CPU-bound tasks by a considerable amount. On a 1.8GHz Intel Core i5 CPU running OS X 10.9.5, Hive runs about 25% faster, on average, than the cluster implementation for computing the 35th number in the Fibonnaci sequence n times, and almost 100% faster, on average, for running parallel syntax transformations with Babel. All the while, Hive tasks wait (from the time they are requested to the time their result is ready), on average, 34% less time than the cluster alternative.

That said, and though I'm very happy with such results, I don’t exactly think of this project as a great short-term solution to improving CPU-bound computations in JavaScript. Rather, I think of this simply as a good demonstration of the value of properly addressing the limitations of concurrent JavaScript execution. With that in mind, let me share a brief look at what we can expect in the near future for concurrent JavaScript.

The Future of Concurrent JavaScript

The future of concurrent JavaScript is particularly exciting because its time is coming very soon. Node.js is already addressing some of the issues presented here with their own Workers implementation, the recently released JDK8 ships a new JavaScript engine, Nashorn, for which thread safety is already high priority, and TC39 already has proposals on the roadmap for the "Vat Model" in ECMAScript 8.

Node.js Workers

In describing Hive, hopefully I made clear the similarities it shares with the Web Workers model. I expect this will be the primary way forward for most JavaScript implementations, and sure enough, Node.js is expected to introduce Workers shortly, a feature previously expected to land only in IO.js in the short-term.

The implementation currently proposed offers a message-passing channel that I expect will be much faster than what I’ve used for Hive, as it should allow users to avoid a JSON serialization step. There’s also brief mention of implementing a Worker-based cluster module, enabling round robin scheduling (the default for the existing cluster module) amongst a set of workers just as easy.

Nashorn Thread Safety

Nashorn is a relatively new JavaScript engine built on the JVM that is now standard in JDK8. I haven’t mentioned it thus far, but it was a strong candidate during my concurrent JavaScript execution research, as I was particularly interested in working on a platform where concurrency is such a core feature, as well as interested in the CSP model built into Clojure’s core.async.

Concurrent execution and thread safety is already a high priority in Nashorn, and it seems there may be two paths forward. The first is a workers implementation not unlike the one we’ve been discussing here so far. The second looks to be Java interop made available as a Nashorn API extension, which might mean access to tools such as the Executors framework – a now standard Java framework for executing tasks on existing threads rather than spinning off new threads for each task. This certainly sounds exciting, though I wonder if inter-thread communication (ITC) would cause developers trouble, as it does not sound like Nashorn will address shared references soon.9

ECMAScript 8 Vat Model

function sum(a, b) {  
  return a + b;
}

const vat = Vat();  
const sumPromise = vat.eval(sum.toString());  
const result = sumPromise(3, 5);  

The proposed roadmap for the ECMAScript 8 specification includes discussion around the “Vat model,” for which there is an existing strawman proposal detailing some really exciting ideas.10 The example given here is modified from that presented in the strawman proposal to help clarify pending syntactic sugar changes.

At a high level, the idea is that the Vat() constructor spawns a new event loop which runs concurrently with the event loop that spawned it. Each event loop then maintains its own thread of control, as well as, among other things, a FIFO queue representing tasks which the vat will eventually “deliver.” In this way, it seems that the vat model also shares a lot of similarities to Hive. In fact, the current intention is “that WebWorkers can be implemented in terms of vats and vice versa.”

The most exciting aspect of this proposal, to me, is that it stands to govern the JavaScript language itself, not the specialized APIs that augment the way we currently work with JavaScript, such as Node.js. Thus, concurrent JavaScript execution will eventually, and hopefully soon, become supported by every major JavaScript engine by default.

Thanks again to Jonas Gebhardt (@jonasgebhardt) for his constructive feedback and editing.