Last Updated: February 25, 2016
·
2.78K
· andrewvc

Async M:N. What node.js Users Should Know

Here's the description of node you haven't heard before. The goal of node.js, or any long-polling/websocket/whatever network server really, is to map M connections to N physical cores on your box. Understanding the details of this will help make it clear why using async everywhere is insane.

Ever wonder why JS is async? It's async because it was built to be a UI scripting language. UIs happen to be very similar to long-poll servers in that they both have a large number of event sources (UI elements in the case of a browser, sockets in the case of a server), that should be multiplexed onto a non-blocking thread. This is M:1, where M is the number of connections (or UI elements), and 1 is a JS platform's single thread.

Writing single threaded async code that works well is hard. Firstly, async code is flat out more verbose and harder to read than equivalent blocking code. Additionally, you have a single thread that you cannot block. Try processing 3000 DB rows in node and watch your QoS suffer. You can always work around it but the work arounds suck. Using nextTick adds unnecessary complexity, and a web worker is fine, but if you're really spinning of threads why aren't you using another language that at least gives you the convenience of non-async code, or even saving that, why aren't you using a language like java that comes bundled with amazing threading support via its executors framework and atomic classes?

Thread scheduling is awesome, and works well, contrary to what many would have you believe. It works especially well for web-apps where threads generally don't contend over resources on the app server itself. Web apps generally are independent processes. The problem of shared memory is usually minimal in a properly architected web app.

The thing is, an async reactor like node is a fantastic pattern when dealing with IO and a terrible pattern for dealing with app logic. It's fantastic for IO because a single thread can indeed be faster due to a lack of concurrency interactions (locks, thread scheduling, etc.) when doing many fast, non-blocking operations.

The reality of app logic is that app logic does block performing parallel ops because CPU operations are, in a sense, blocking. You are still contending over a number of cores. In that case what you want is M:N, where M is the number of connections and N is the number of active threads handling app logic. I say let async do what it does well (parallel IO), and let threads do what they do well (schedule CPU).

Some people may wonder how this works in practice, thinking that 1000 websocket connections need 1000 threads. What you should have is 1 async reactor handling the IO handing off discreet messages to N threads. It's a good pattern, and works well. The state can be encapsulated in either a closure or an object, that's up to you (and your programming language).

In the case of highly parallel IO, yes, a thousand times yes async is great. But the great lie about node is that people need to carry over the async from the IO layer to the app logic layer. In node it's all just mashed together.

The trick about async is that your server can handle 10,000 connections that are idle, but only a handful active at a given time since you only have a few cores. Thread scheduling works just fine.

None of these ideas are novel, in fact, they are decades old. I can only hope that people are willing to open up their minds to the idea that the future of concurrent web programming has many possibilities.

3 Responses
Add your response

In any reasonable application, the app logic is intermixed with IO, and you cannot separate them easily. What if your app logic is to poll 10 other servers (or DBs) and combine the results and based on those, you do some recursive operations on DBs and finally you spit out some response. Once you have such intertwined IO and app logic, then blocking is going to kill your performance and you have to employ some async framework in your threaded java environment.

The real problem is that blocking services and modules do not compose, while async ones do (there is the problem of throttling of shared resources, but that can be handled).

over 1 year ago ·

The main, origin, goal of node was and is to fast develop specialized webservers, that are able to do amazingly many concurrent connections. it happened, that the concept to solve this problem is also a good one to solve many other problems. but never all. No serious node.js developer would promote node as solution for all problems.

JS is NOT async. you can write pretty much sync code in js. In Browser, there are 2 main API's that are async, the DOM Events and AJAX. DOM is built to be a UI Framework, thats why. Node.js is built to solve I/O-heavy problems on a server, thats why it is async too.

Writing multi-threaded sync-code that works well is hard. Firstly, sync code tend to use shared memory in combination with locking mechanism, that leads to unpredictable behavior. You can overcome it with proper frameworks. same is valid for async style. you don't like it? there are tools to help you flatten and prettify your code, if you can manage it yourself in the first place. Want to process 3k DB rows? Use Streams, not nextTick, not webworker or else. 3k DB-rows are I/O-heavy tasks. Why the heck are loading all 3k rows in to memory anyway? to avoid i/o and it's blocking nature?

if you really find yourself want to spinning many threads, may be your application is really CPU-eater, then go and use another platform, it's fine.

yeah, typical webapps are seldom suffering from typical concurrency problem. but yeah, typical webapps are seldom use much cpu, cause most of them are i/o-heavy, db access, file reading, access of other web services etc. oh yeah, they access even computing web-services written on better fitting platforms for computing things. instead of spinning more and more threads, and eating up memory resources, a single-threaded eventlooper handles all the i/o gracefully.

"I say let async do what it does well (parallel IO), and let threads do what they do well (schedule CPU)."

no one really doubt that. you COULD do app logic async, with practice it's not hard. you also COULD (and it is done widely) I/O with threads and sync code, with practice it's not hard.

there is no lie about node. Node is and was always a platform to do network-heavy stuff. and it does it pretty good. In node is nothing mashed together. It's users like you and me, who mash up things, that belong to separate layers. It's very specific to your application. if CPU become a bottleneck, change your system architecture, there are at least threading libs (with message passing btw.) for node. or move your app-logic to another platform.

People are willing to open up their minds to the idea that the future of concurrent web programming has many possibilities. That's why Node.js exists

over 1 year ago ·

This post seems more like a biased rant, rather than a logical conclusion. They are many incorrect statements as mentioned above. Why not write something before getting biased and actually using it for an application or two.

over 1 year ago ·