Hunting leaking connections in node.js apps
You've written a fancy node.js app with thousands of long-running connections, but, alas, it looks like some of them are leaking: your app is supposed to have released them, but they are still open somehow. Here's how I hunt them.
node.js is great at handling lots of concurrent connections. Its async model means that you don't need a thread behind each connection, and running your code in a single-thread makes everything more predictable.
However, one unfortunate side-effect is that tasks you think have ended can restart spontaneously if you still have a listener registered somewhere. Leaking connections will slowly accumulate until your app finally crashes or your get complains from the APIs you're using.
Tracking open connections
lsof is a very helpful tool that list open files in your system. Since sockets are files, it can help us. We'll start with:
lsof -i tcp:993 -n -P
-i tcp:993
lists only TCP connections involving port 993 (my app deals with IMAP connections). -n -P
tells lsof not to attempt to resolve domain names or port numbers.
You'll get something like:
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
staging.p 6548 node 18u IPv4 112416266 0t0 TCP 94.23.XX.YYY:35685->74.125.XXX.YYY:993 (ESTABLISHED)
staging.p 6548 node 19u IPv4 112416006 0t0 TCP 94.23.XX.YYY:35641->74.125.XXX.YYY:993 (ESTABLISHED)
staging.p 6548 node 20u IPv4 112415956 0t0 TCP 94.23.XX.YYY:40954->74.125.XXX.YYY:993 (ESTABLISHED)
staging.p 6548 node 21u IPv4 112415928 0t0 TCP 94.23.XX.YYY:40926->74.125.XXX.YYY:993 (ESTABLISHED)
…
You can already check if you get more connections than expected. The second step is to tie these connections to actual sockets in your app.
Parsing lsof result
lsof actually makes it very easy: the -F
switch selects fields and outputs them in a machine-readable format. Then you can parse them with:
var cp = require('child_process');
function lsofImap(done) {
cp.exec('lsof -i tcp:993 -n -P -F cpnT', function (err, data) {
if (err)
return done(err);
var lines = data.split('\n'),
res = [];
var pid = 0,
command = null,
sockHash = null,
tcpState = null,
prefix = null;
lines.forEach(function (line) {
var s = line.slice(0, 1);
line = line.slice(1);
switch (s) {
case 'p':
pid = line;
break;
case 'c':
command = line;
break;
case 'n':
sockHash = line;
break;
case 'T':
prefix = line.slice(0, 2);
if (prefix === 'ST') {
tcpState = line.slice(3);
res.push({pid: pid, command: command, addr: sockHash, tcp_state: tcpState});
}
break;
}
});
done(null, res);
});
}
This will give you a set of objects like:
{
pid: '28250',
command: 'node',
addr: '192.168.1.140:58479->74.125.XX.YYY:993',
tcp_state: 'ESTABLISHED'
}
Linking lsof open connections with node.js sockets
Even though socket don't have nice, clean IDs, they all have something truly unique: the [src_ip, src_port, dst_ip, dst_port]
tuple. We already retrieve it in lsof's output. It's very easy to get the same thing for node.js sockets:
function getSocketHash(socket) {
if (!socket || !socket.address)
return null;
var addr = socket.address();
if (!addr)
return null;
return addr.address + ':' + addr.port + '->' + socket.remoteAddress + ':' + socket.remotePort;
}
Finally, you just need to add a socket audit to your app, and you're done:
var _ = require('underscore'),
socketAudit = {};
exports.reportSocket = function (inboxId, socket) {
var hash = getSocketHash(socket);
if (!hash)
return;
// mark previously active socket as leaking
_(socketAudit).chain().where({inbox_id: inboxId, leaking: false}).each(function (info) {
info.leaking = true;
info.until = new Date();
});
socketAudit[hash] = {
inbox_id: inboxId,
leaking: false,
since: new Date()
};
};
Whenever I open a new connection for an IMAP inbox, I pass it to reportInbox
. I keep a log of all previous connections.
Finally, I can just do:
lsofImap(function (err, sockets) {
if (err)
return done(err);
_(sockets).each(function (socket) {
var audit = socketAudit[socket.addr];
if (audit) {
_(['inbox_id', 'leaking', 'since', 'until']).each(function (field) {
socket[field] = audit[field];
});
} else
socket.orphan = true;
});
done(null, sockets);
});
It will give me a list of all open connections, with information pulled from the internal state of my node.js app. It will tell me about leaking connections, which inbox they involve and when they were supposed to terminate. Then, we can scour the logs to investigate what happened at that time that made the connection leak.
Fell free to ask for more code in the comments :)
Written by Laurent Perrin
Related protips
5 Responses
Did you find what was the case of your leaking sockets?
Currently I experience similar behavior - during the time I have living tcp connections that didn't close for unknown reason.
So they stacking and makes server to drop connections after about a day. Node.js cluster restart is helps but it isn't a solution.
The most common reason is if you have a missing callback: you return from an async function without ever calling the callback and your connection just stays open "forever" (until a TCP timeout eventually occurs).
In my case, I had lots of persistent connections and the reconnection algorithm sometimes didn't close the previous connection.
Thanks for the response. But even if somewhere is a callback that didn't called, there is 2 minutes timeout:
http://nodejs.org/api/all.html#all_server_settimeout_msecs_callback
But the connections that I am facing, looks that they are living much longer.
If they are managed by a driver, it might keep them open without your knowledge. What I do is that I wrap each socket in a domain (http://nodejs.org/api/domain.html) so I can have some context (why and when it was opened, when it's supposed to have closed, etc.). Then if I get leaked sockets, I can always pull the domain and see the history of the socket.
So if you interested in, the problem was that httpS
server doesn't inherit sockets timeout logic from http
server.
There is a patch merged since Apr 2013:
https://github.com/joyent/node/issues/5361
But it still doesn't merged to stable 0.10 branch.
The solutions:
- avoid https usage, use some proxy server (nginx)
- custom timeout handle
- use node version higher than 0.10
Related serverfault topic:
http://serverfault.com/questions/660248/nodejs-server-doesnt-close-tcp-connections