Process-based vs Thread-based concurrency (Node.js)

This is a topic which has been getting some interest lately and so I am writing this post to share my point of view on the subject. Just to be clear; when I say concurrency, I am talking about a web-server’s ability to run on multiple CPU cores in parallel in order to service a large number of simultaneous users.

It seems that a lot of developers believe that process-based concurrency is doomed to fail. I have seen posts here and there trying to reinforce this idea, and then I saw this video: https://www.youtube.com/watch?v=C5fa1LZYodQ&t=35m53s – The whole talk is basically a Ruby developer’s attack on Node.js (this talk was delivered at JSConf… of all places). The speaker makes some valid points but I strongly disagree with the way he attacks Node.js’ approach to concurrency.

In case you weren’t aware of it, Node.js is single-threaded – Well actually that’s not true – Only your code is single-threaded; hence the saying; “In Node.js, everything is parallel except your code” – Basically, Node.js doesn’t expose the concept of threads to developers. As a Node.js developer, if you want to scale your Node.js server to run across multiple CPU cores, you will have to deploy it as multiple processes.

Node.js offers two excellent core modules for dealing with multiple processes; the child_process module ( http://nodejs.org/api/child_process.html ) and the poorly understood (but very useful when used properly) cluster module ( http://nodejs.org/api/cluster.html ).

There have been some complaints about the cluster module in the Node.js community – But the important thing to remember about the cluster module is that it is NOT a load balancer – All it does is allow you to evenly and efficiently distribute traffic between multiple Node.js processes – If you’re dealing with only HTTP traffic, then maybe that’s all you need – However, If you need to handle persistent WebSocket connections, then the cluster module should only serve as the entry point of your sticky load-balancing armada.

OK, so aside from rookie misunderstandings – The main argument which developers make against the multi-process approach is that processes carry a crippling ‘overhead’ which make them unsuitable for concurrency and that threads do not have any such overhead. Process-haters claim that this cost involves both a great deal memory and CPU.

While there are specific architectures for which this belief might hold true, it is not the case for Node.js. For example, some server architectures have a separate process/thread to handle every request – Now a typical Node.js process (with a bunch of libraries bundled in) could take up about 6MB of memory – So if you were thinking of launching such a process for EVERY request, then yes, process concurrency would be a HORRIBLE idea! The reality though is that Node.js is not meant to be used like that. Node.js leverages threads “behind the scenes” to allow you to service requests asynchronously (allowing your code to run within a single thread of logic); this means that a single process is capable of servicing tens of thousands of concurrent users. Ok, so now the 6MB overhead doesn’t seem like a lot of memory at all! 6291456 bytes / 10000 users – That’s a memory overhead of less than 630 bytes per user which is negligible!

The second argument against process concurrency is that context-switching between processes uses much more CPU cycles and is highly wasteful. While this might be true if you’re running your server on Windows (in which case your server will probably suffer anyway!), the general consensus in the Linux community is that the two are essentially the same when it comes to context switching. To put the final nail in the coffin – Even if context-switching between processes did use a bit more CPU – you will find that the more CPU cores you have, the less of a problem context switching becomes. Context switching is only ever a factor if you have two or more processes trying to share the same core.

From a software development perspective, I can only speak from my personal experience – I’ve worked with thread-based concurrency in C/C++ and I’ve also worked with process-based concurrency in Node.js ( see SocketCluster ) – All things considered, I have found working with Node.js processes much more pleasant than dealing with C threads. I guess I prefer cross-process message-passing (loose coupling) over mutexes and semaphores.

 

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s