Friday, March 21, 2014

Creating processes in node.js

In some of my previous posts I started comparing some of the features of node.js with java and since then my interest has started to grow for node.js.

One thing I just learned is that node runs in a single thread mode, it uses an event-driven paradigm to handle concurrency.

Well that's fine but what about parallel processes how can you take advantage of this when you are in a single thread program, well node.js has some mechanisms to create processes and take advantage of parallel computing, and I will tell you how.

First we will need a process to run as a worker, there could be many instances running of this process at the same time.

For the purpose of this blog this process will have just a log to console, also any kind of process can be created not just node process, but also for the blog I will use just this.

For creating a child process the child_process module is required, this module has different ways to create process.

The exec method of the child_process runs a command in a shell and buffers the output. It has the following signature:

               child_process.exec(command, [options], callback)


In the example the command "node worker.js i"  is executed where i is a parameter passed to the process.
Also a callback function is passed and it is called when the process terminates and allows to read and process the output from it.

After running the example the following should be printed in console for each process created.

stdout: Process 0 at work
stderr:

Child process exited with exit code 0

The spawn method of the child_process module launches a new process with a given command, it has the following signature:

               child_process.spawn(command, [args], [options])


After running the example the following should be printed in console for each process created.

stdout: Process 0 at work

child process exited with code 0

Exec() and spawn() both create processes, but they differ in what they return, spawn returns streams (stdout & stderr), while exec returns a buffer with a max size. Spawn() should be used when the process returns large amount of data.

Another difference is that spawn() starts receiving the response as soon as the process starts executing, and exec() otherwise waits for the process to end and tries to return all the buffered data at once

The fork method is a special case of the spawn() functionality, it only creates Node processes. Its signature is the following:


              child_process.fork(modulePath, [args], [options])

After running the example the following should be printed in console for each process created.

Process 0 at work
child process exited with code 0

Node.js also has the cluster module and also allows to create processes, the definition of the cluster module says: "A single instance of Node runs in a single thread. To take advantage of multi-core systems the user will sometimes want to launch a cluster of Node processes to handle the load".

From the example you can notice that the cluster module is imported, once imported you can call fork() method to create a new child process, after calling fork() the same process will run but as a child, to distinguish if the process is running as a Master or child the cluster module has the isMaster() and isWorker() methods.

After running the example above, the following should be printed.

Master Forking!!!!
Worker 4984 is online.
....
Child process running!!!
....


You can find more information about these modules in the following links:
http://nodejs.org/api/child_process.html
http://nodejs.org/api/cluster.html

Node.js says that runs in a single thread mode, but as with this brief example, I show that node.js has different mechanisms to create processes and to take advantage of multiple processors.