Boldizsár's programming blog

Worker threads in NodeJS

May 02, 2019 | 12 Minute Read

Threads in NodeJS are here since v10.5.0. But wait... How did Node work before with no workers threads? Read on to get to know more!

JavaScript is single threaded for real

JavaScript is single threaded which means that the code runs on one thread and it cannot run in parallel. JavaScript was intentionally created this way as the main aim back then was to add some simple dynamic behaviour to the web like changing colours of buttons when there’s an event or creating an alert. Having only one thread means that it can’t leverage our multi core CPUs. In the browser each tab has its own process with only one thread. Later this started to become a drawback as more complex logic was needed to be executed there. As long as scripts were running the main thread was blocked and the user started to see the websites unresponsive. So the Web Workers API came into being. Using that it is possible to run background tasks in different threads without blocking the main thread and scaring the user away. JavaScript has a concurrency model so it tries to balance this drawback. This model is based on the event loop. UI is never blocked as long as we don’t have any CPU intensive code. The responses of the I/O calls are always received through a callback function or an event so other JS code can run in the meantime.

NodeJS is not only JavaScript

NodeJS is actually JavaScript and C++ as it’s not a language but a JavaScript runtime built on Google’s V8 engine. This V8 engine utilizes libuv for asynchronous I/O. That code is actually very low level and you can image it’s pretty fast. Basically that’s how NodeJS utilises the operating system’s multithreading capabilities. Libuv contains the implementation of the event loop in Node. With the event loop it is possible to perform non blocking I/O operations. It is done by offloading tasks to the OS kernel. So as soon as we leave JavaScript NodeJS might use multiple threads to execute tasks and while the kernel is working on these operations our JS codes can still run so the main thread is not blocked by the I/O tasks. When the kernel has finished the response goes back to NodeJS and finally to our callback which was written by us. Actually what happens is that the kernel registers the response to the poll queue of the event loop and the main thread will pick it up for execution.

What can we do to handle CPU intensive tasks in NodeJS?

Complex backend systems are set up using NodeJS. But you can ask how it is possible since sometimes it’s inevitable to write CPU intensive code. Don’t worry, NodeJS is not completely lost here. Not only do we have npm libraries that try to overcome this problem but there are built in modules too. For example child_process or cluster modules. But they are actually not as good as the true multithreaded environments. Using the child_process module you can start child processes which can execute any command not only JavaScript. With the cluster module you can launch a cluster of NodeJS processes to utilize multiple CPU cores and to share the load among them. But creating processes with their own NodeJS instance is not as lightweight as threads and they can’t share memory either.

With Node v10.5.0 the worker threads module was introduced which offers the creation of threads that can execute JavaScript code in parallel. With this module we could potentially multithread our entire application. But concurrent programming is never easy. As we know by now, V8 might use multiple threads if it finds it necessary and faster to execute the I/O tasks and with the worker thread module we can also execute our JavaScript on multiple threads. Note that this module is still in experimental stage which means that they might deprecate anything or just get rid of the whole module in later versions.

Let’s see an example app to see worker_threads in action. We’re gonna look at two examples. In the first one we’ll see how we can execute CPU intensive operations in three threads and how they can communicate with the main thread.

mkdir workerthreads && cd workerthreads
npm init -y
touch index.js
touch job.js

Open up the two files in your favourite editor. We’re going to create three threads that are going to race with each other. At the end we’re gonna see which one is the winner. They’ll also report back their status so that we can see where they’re at. So let’s see the code.

const path = require('path');
const { Worker } = require('worker_threads');

Array(1,2,3).forEach(index => {
  const worker = new Worker(path.join(__dirname, 'job.js'), {
    workerData: 100000000
  });
  worker.on('message', message => console.log(`Thread ${index} sent: ${message}`));
  worker.on('online', () => console.log(`Thread ${index} is started`));
  worker.on('exit', () => console.log(`Thread ${index} has exited`));
  worker.on('error', (error) => console.log(`Thread ${index} has thrown`, error));
});

Let’s see what I did here. I imported the path and the worker threads modules. I created the threads using the Worker class from the imported module. As you can see in the constructor we have to pass the file which we want to be executed on a different thread. In the second argument we can pass options like the workerData. It is used to send some initial data to the worker. The HTML structured cloning algorithm is used to clone the data. I then register a couple of events (Note that the Worker class extends the EventEmitter) to be able to receive some information from the threads. online is called when the thread starts, exit when finishes and I think the other two are quite obvious. Let’s take a look at the job.js file.

const { workerData, parentPort, threadId } = require('worker_threads');

const limit = workerData;
const step = limit / 10;

for (let i = 1; i <= limit; i++) {
  if (i % step === 0) {
    parentPort.postMessage(`Thread ${threadId} at ${i / limit * 100}%`)
  }
}

I import quite a few things here. workerData is the passed data in the worker’s constructor so we can use it to access that data passed by the main thread. threadId is the id of the current thread. If a spawned thread is running the code where the parentNode is imported, then it can be used to send messages to the parent thread. It is actually a MessagePort object which was created under the hood by Node. MessagePort is an asynchronous, two-way data channel. In the for I go from 1 to the passed limit and at every 10% I send a message to the parent thread about the progress. Since this is only an experimental feature, we have to use a special flag when executing. Of course you should have at least v10.5.0 installed to use this module as it is introduced from that version. Run this command.

node --experimental-worker index.js

This is an example outcome and of course yours might be different. But we can see some racing here. Also, you can see that in the main thread we can access the messages and events of the workers. Pretty cool! Let’s add some new features. Let’s say we want to declare a winner and send back the position of each thread when they finish. So I want to show you how to send data from the main thread to the spawned threads.

Add this line in the index.js after the second line.

let position = 1;

And modify the online event’s listener.

  worker.on('online', () => {
    console.log(`Thread ${index} is started`);
    worker.postMessage(position);
    position = position + 1;
  });

In the job.js add this before the for.

parentPort.once('message', (message) => {
  console.log(`Thread ${threadId}: I finished at ${message}`);
});

Note that if you use on instead of once the thread won’t exit because it has an active listener which cannot be garbage collected. Run again the app.

node --experimental-worker index.js

You can see that the message from the main thread arrived to the spawned threads. In this example we’ve looked at how you can create new threads as well as how to communicate between the threads.

In the next example we’re gonna look at how we can share data between the threads. Since these threads are in the same process it is possible to share data should you wish to. Create a file for the example.

touch index2.js

Copy this code in the index2.js.

const { Worker, isMainThread, parentPort } = require('worker_threads');

if (isMainThread) {
  const worker = new Worker(__filename);
  const shared = new Uint8Array(new SharedArrayBuffer(1));
  shared.set([1]);
  worker.on('message', (message) => {
    console.log(message, shared);
  });
  worker.postMessage(shared);
} else {
  parentPort.once('message', (message) => {
    parentPort.postMessage('Check now: before');
    setTimeout(() => {
      message.set([2]);
      parentPort.postMessage('Check now: after');
    }, 100);
  });
}

Now both the main and the spawned thread will execute the same file. With the isMainThread flag we can decide if the current thread is the main one. If so we create a worker and create a SharedArrayBuffer object which is a binary data buffer. We create a Uint8Array object which is a one byte unsigned integer and we’ll pass the SharedArrayBuffer object to its constructor so that it will use the buffer’s space in memory. Then I set its value to a one element array containing the number 1. I register a callback for the messages then I pass the shared object to the spawned thread. In the else branch I first post a message to the main thread to check the current content of the shared object. So the main thread prints it. As you can see I update the shared object in a setTimeout block because if I didn’t then when the main thread wants to print the shared object the spawned thread has already updated it so basically the first console.log would print the updated shared object too. You can run the app with this command.

node --experimental-worker index2.js

You can see the second line contains the updated version of the shared object. It was logged in the main thread but was modified in the spawned thread so we can see the memory is shared.

We’ve seen some part of the worker_threads API. This has not been everything so if your interested further go to the official website. Since this is still in experimental stage I wouldn’t recommend using it in production but let’s hope it’ll become a stable part of NodeJS soon. A final note about the creation of threads. As we use a pool for database connections, we should use a pool of threads instead of always creating new ones so that the overhead of the creation can be avoided.

You can find the code on my GitHub. I hope you enjoyed it and learned from my article something. If so, click on the recommend button below and share my article. Cheers!

UPDATE

On 9th September 2019, the worker_threads module was marked stable. So the API is not expected to receive any breaking changes.