Full stack pub/sub with SocketCluster

I have been working on a module called SocketCluster (https://github.com/topcloud/socketcluster) which allows developers to build robust and highly scalable realtime systems. It basically allows you to run your Node.js code on multiple CPU cores in parallel – It also handles a lot of tedious things like multi-process error logging, respawning dead workers, reconnecting clients whose connections drop out, etc… SC ‘s realtime features allow you to extend pub/sub all the way to the browser.

sc_architecture

This diagram is a bit generic, but you could assume that channel 1, channel 2, channel 3… represent different chat rooms which clients can join – Each room could focus on a particular topic, for example let’s say ‘math’, ‘physics’, ‘chemistry’… To listen to all messages posted in the room ‘physics’, all you have to do on the client is call: socket.on(‘physics’, function (data) {…}). To send messages to the physics room; you would just call socket.publish(‘physics’, messageData). SocketCluster lets you specify middleware on the server to allow or block specific users from subscribing, publishing or emitting certain events. Examples:

scServer.addMiddleware(scServer.MIDDLEWARE_SUBSCRIBE, function (socket, event, next) {
    // ...
    if (...) {
        next(); // Allow
    } else {
        next(socket.id + ' is not allowed to subscribe to ' + event); // Block
    }
});


scServer.addMiddleware(scServer.MIDDLEWARE_PUBLISH, function (socket, channel, data, next) {
    // ...
    if (...) {
        next(); // Allow
    } else {
        next(socket.id + ' is not allowed to publish the ' + event + ' event'); // Block
    }
});


scServer.addMiddleware(scServer.MIDDLEWARE_EMIT, function (socket, event, data, next) {
    // ...
    if (...) {
        next(); // Allow
    } else {
        next(socket.id + ' is not allowed to emit the ' + event + ' event'); // Block
    }
});

SocketCluster also exposes a Global object on the server-side (scServer.global) through which you can publish events to clients:

scServer.global.publish(eventName, data, cb);

An emitted event is only sent to the other side of the socket (I.e. server => client or client => server); on the other hand, a published event will be sent to all subscribed client sockets (and nobody else!).
To subscribe a client socket to an event, you just use:

socket.on(eventName, listenerFn);

^ SocketCluster will automatically send a subscription request the server (it will have to pass through the SUBSCRIBE middleware first though!).

Process-based vs Thread-based concurrency (Node.js)

This is a topic which has been getting some interest lately and so I am writing this post to share my point of view on the subject. Just to be clear; when I say concurrency, I am talking about a web-server’s ability to run on multiple CPU cores in parallel in order to service a large number of simultaneous users.

It seems that a lot of developers believe that process-based concurrency is doomed to fail. I have seen posts here and there trying to reinforce this idea, and then I saw this video: https://www.youtube.com/watch?v=C5fa1LZYodQ&t=35m53s – The whole talk is basically a Ruby developer’s attack on Node.js (this talk was delivered at JSConf… of all places). The speaker makes some valid points but I strongly disagree with the way he attacks Node.js’ approach to concurrency.

In case you weren’t aware of it, Node.js is single-threaded – Well actually that’s not true – Only your code is single-threaded; hence the saying; “In Node.js, everything is parallel except your code” – Basically, Node.js doesn’t expose the concept of threads to developers. As a Node.js developer, if you want to scale your Node.js server to run across multiple CPU cores, you will have to deploy it as multiple processes.

Node.js offers two excellent core modules for dealing with multiple processes; the child_process module ( http://nodejs.org/api/child_process.html ) and the poorly understood (but very useful when used properly) cluster module ( http://nodejs.org/api/cluster.html ).

There have been some complaints about the cluster module in the Node.js community – But the important thing to remember about the cluster module is that it is NOT a load balancer – All it does is allow you to evenly and efficiently distribute traffic between multiple Node.js processes – If you’re dealing with only HTTP traffic, then maybe that’s all you need – However, If you need to handle persistent WebSocket connections, then the cluster module should only serve as the entry point of your sticky load-balancing armada.

OK, so aside from rookie misunderstandings – The main argument which developers make against the multi-process approach is that processes carry a crippling ‘overhead’ which make them unsuitable for concurrency and that threads do not have any such overhead. Process-haters claim that this cost involves both a great deal memory and CPU.

While there are specific architectures for which this belief might hold true, it is not the case for Node.js. For example, some server architectures have a separate process/thread to handle every request – Now a typical Node.js process (with a bunch of libraries bundled in) could take up about 6MB of memory – So if you were thinking of launching such a process for EVERY request, then yes, process concurrency would be a HORRIBLE idea! The reality though is that Node.js is not meant to be used like that. Node.js leverages threads “behind the scenes” to allow you to service requests asynchronously (allowing your code to run within a single thread of logic); this means that a single process is capable of servicing tens of thousands of concurrent users. Ok, so now the 6MB overhead doesn’t seem like a lot of memory at all! 6291456 bytes / 10000 users – That’s a memory overhead of less than 630 bytes per user which is negligible!

The second argument against process concurrency is that context-switching between processes uses much more CPU cycles and is highly wasteful. While this might be true if you’re running your server on Windows (in which case your server will probably suffer anyway!), the general consensus in the Linux community is that the two are essentially the same when it comes to context switching. To put the final nail in the coffin – Even if context-switching between processes did use a bit more CPU – you will find that the more CPU cores you have, the less of a problem context switching becomes. Context switching is only ever a factor if you have two or more processes trying to share the same core.

From a software development perspective, I can only speak from my personal experience – I’ve worked with thread-based concurrency in C/C++ and I’ve also worked with process-based concurrency in Node.js ( see SocketCluster ) – All things considered, I have found working with Node.js processes much more pleasant than dealing with C threads. I guess I prefer cross-process message-passing (loose coupling) over mutexes and semaphores.

 

Eval is not evil. Not thinking is evil.

I am sick of hearing the phrase ‘eval is evil’ – It is one of the most overused and inaccurate statements concerning JavaScript. The phrase has become so widespread that people tend to not question it – The sheer number of blogs which just stupidly repeat the phrase is staggering.

So I’m going to say it here; eval is NOT evil. Not thinking is what’s evil. While you should never pass user input to eval, there are safe, useful things that you can do with it.

For example, one of the coolest things that you can do with eval is execute scripts which you loaded dynamically using AJAX – This functionality gives you a great deal of control over how scripts are loaded and when they are executed and it even lets you track a script’s download progress. Because the eval’d content is not user input, this use case is as safe as directly embedding the script into the DOM using a script tag – Just make sure that you use //@ sourceURL = … within the eval’d code to make sure that it is easy to debug (this can be done programmatically by appending //@ sourceURL = … to the source code string before passing it to eval).

Why JavaScript is the future of software

I have been using JavaScript for a long time but the full extent of its potential didn’t really occur to me until recently and now I am convinced that JavaScript, of all languages, is the one which will carry us into the next epoch of software development.

Not so long ago, like many other software developers, the web projects which I worked on were mostly written in PHP (often using a PHP framework like Zend, CodeIgniter or a custom one) – JavaScript was just an add-on, optional feature meant to ‘enrich’ the user experience. Today it’s been over a year since I’ve been coding almost exclusively in JavaScript (both for my personal projects and at work) and I am hooked. I have a pretty broad software background (C/C++, Java, Python, ActionScript3, C#, AVR assembly and maybe a few others) so I hope that the following analysis of JavaScript is fair and accurate.

There are several aspects of JavaScript which make it a really great language to work with – One of the most important aspects for me is its conciseness and its versatility – Of all the languages that I’ve used, JavaScript is the best when it comes to turning thoughts into code – Everything is either an Object, a Function (which is also an Object, technically-speaking) or a primitive and once you understand the relationship between these three basic types (which can take a little bit of time), JavaScript becomes really easy and fun to write. Another cool aspect is that JavaScript is ideal for dealing with asynchronous logic; JavaScript allows you to define functions wherever you want – This lets you execute code asynchronously without losing the scope of the current function (this can be done using an inline callback) – This is a HUGE advantage and can save you hours (dealing with multiple parallel processes is ridiculously simple in Node.js – Especially when compared to C++).

The other, and possibly the most awesome thing about JavaScript today (which I alluded to in the previous paragraph) is the fact that you can run it on either the client side (in just about any browser) or on the server side using engines such as Node.js – And I think that this feature is what makes JavaScript the language of the future – No other language has managed to be standardized so formally by so many different companies, browsers and platforms as JavaScript. As an example of this, for the past couple of months, I’ve been doing some contract work building a framework for a major set top box company (who make those devices people plug into their TVs) and guess what the framework is written in!
It’s not actually Node.js (it’s using a custom engine based on WebKit) but yes, JavaScript! A couple of years ago, JavaScript was only for the browser on a PC, then it moved to mobile phones, then it got traction as a server-side language and now it’s moving to embedded systems!

Several decades ago, Mark Weiser ( https://en.wikipedia.org/wiki/Mark_Weiser ) who is often referred to as the father of ubiquitous computing theorized that computers would become the fabric of our everyday life – And today this sounds more like reality. We’ve had the internet, iPods, smart phones, tablet computers, internet TV, and now we are about to get smart watches and Google Glasses! Let’s face it, all of these devices will run on completely different firmware/operating systems and will be produced by different companies.
Here is the selling point of this article: JavaScript is currently the only language which is in a position to bring all of these devices together to allow them to interoperate.

JavaScript Inheritance Done Right

One of the worst (and possibly also one of the best) things about JavaScript is that it offers you many ways to do anything. Some tricks are simple and neat while others can be complex and confusing.

If you have done some research, you will likely have come across many ways of implementing inheritance in JavaScript. I have personally used a lot of variations over time in various projects but it’s not until recently that I have settled on a particular technique.

I have chosen this particular technique because I feel that it most closely resembles how most OO programming languages implement inheritance. To get it to work properly, you should use the prototype-based method of class definition. Here is an example of the technique in action:

function BaseClass(name) {
    this.name = name;
}

BaseClass.prototype.foo = function (arg) {
    return ‘This is the foo method with argument: ‘ + arg;
};

function SubClass(name) {
    // Call the base class’ constructor.
    BaseClass.call(this, name);
}

// SubClass’ prototype is based on BaseClass’ prototype
SubClass.prototype = Object.create(BaseClass.prototype);

SubClass.prototype.sub = function () {
    return ‘This is the sub method.’
}

SubClass.prototype.foo = function (arg) {
    // Call the base class foo method on the current instance
    return BaseClass.prototype.foo.call(this, arg) + ‘ SUB’;
}

Ok, so what stands out the most in this code is this line:

SubClass.prototype = Object.create(BaseClass.prototype);

When you define a function in JavaScript, initially, its prototype will be an empty object. What we’re doing here is setting the SubClass’ prototype to be a new instance which shares the BaseClass’ prototype. Effectively, this allows us to freely modify the SubClass’ extended prototype without messing with the BaseClass’ prototype – The advantage of this technique over simply cloning the prototype is that changes in the BaseClass prototype will still be reflected in the SubClass (but not the other way around). The best way to think about the Object.create() method is that it instantiates a class based on a prototype without actually going through its constructor. The Object.create() function may not be supported by older browsers, but thankfully there is a simple polyfill which you can get from this page:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Object/create

The second weird thing about the code above is this line:

BaseClass.call(this, name);

Now if you don’t know what call or apply do, you should check this:
http://stackoverflow.com/questions/1986896/what-is-the-difference-between-call-and-apply

OO languages usually offer a way to explicitly call a super class’ constructor or method, JavaScript is no exception. With JavaScript, you just call the constructor/method using either call() or apply() and pass a reference to the current instance as the first argument. This is a neat feature of JavaScript which allows you to effectively borrow methods from any other class to use on the current instance.

If you’re used to languages like Java, C# and Python, you might find the class definitions above unusual – In these languages you have an explicit ‘class’ keyword and you define the methods INSIDE the class block. If however, you have written C++ before, this might seem a little less confusing because C++ allows you to define methods of a class outside of the class definition block. While JavaScript does let you define methods inside the constructor’s function block, this makes inheritance more difficult to achieve and I would strongly recommend that you follow the technique above (I swear you won’t regret it).

GIT recovering from git reset –hard

Today I accidentally deleted my changes on git with the git reset <commit> –hard command and I read in a few places that you cannot recover from this. Well that’s a LIE. Recovering from a hard reset turns out to be quite simple:

Just run git reflog

This should list a bunch of recent versions which git still knows about. The output should look something like this:

50dbb7c HEAD@{0}: checkout: moving from master to 50dbb7c
f29e4a3 HEAD@{1}: HEAD~1: updating HEAD
79c95b2 HEAD@{2}: HEAD~1: updating HEAD
50dbb7c HEAD@{3}: HEAD~1: updating HEAD

My issue was that I had run ‘git reset HEAD~1 –hard’ a few times and I foolishly went down one too many heads, thus deleting my changes.

To get your changes back simply run git checkout <version_hash>, e.g. git checkout 79c95b2 in my case. If it gives you trouble, just use the -f flag: I.e. git checkout <version_hash> -f

 

JavaScript Prototype Instantiation

In JavaScript, functions can be used in two ways; they can be simply called or they can be instantiated. When you call a function, it will simply run the code inside it and possibly return some value – When you instantiate a function, the JavaScript engine will first create a new object (a clone of that function’s prototype) and then will call that function using the newly created object as the context for the ‘this’ keyword.

Function prototypes are essentially empty objects by default – You can add properties to the prototype object using the Function.prototype property. Here is an example of how to add properties to a function’s prototype:

var Fun = function() {
    this.prop = 'This is a property of a Fun instance';
}

// Adding a 'foo' property to the 'Fun' function’s prototype
Fun.prototype.foo = function() {
    return 'foo';
}

var fun = new Fun(); // fun is an object with a 'foo' property which is a function

When ‘new Fun()’ is called, the JavaScript engine first creates a clone of the function’s prototype object which essentially looks like this: {foo: function() {return ‘foo’;}}; then it executes the Fun function using the cloned object as the ‘this’ reference – Because of that, properties added to ‘this’ within the function body can overwrite those defined in the prototype.