Last Updated: February 25, 2016
·
11.74K
· lperrin

Hunting leaking connections in node.js apps

You've written a fancy node.js app with thousands of long-running connections, but, alas, it looks like some of them are leaking: your app is supposed to have released them, but they are still open somehow. Here's how I hunt them.

node.js is great at handling lots of concurrent connections. Its async model means that you don't need a thread behind each connection, and running your code in a single-thread makes everything more predictable.

However, one unfortunate side-effect is that tasks you think have ended can restart spontaneously if you still have a listener registered somewhere. Leaking connections will slowly accumulate until your app finally crashes or your get complains from the APIs you're using.

Tracking open connections

lsof is a very helpful tool that list open files in your system. Since sockets are files, it can help us. We'll start with:

lsof -i tcp:993 -n -P

-i tcp:993 lists only TCP connections involving port 993 (my app deals with IMAP connections). -n -P tells lsof not to attempt to resolve domain names or port numbers.

You'll get something like:

COMMAND    PID USER   FD   TYPE    DEVICE SIZE/OFF NODE NAME
staging.p 6548 node   18u  IPv4 112416266      0t0  TCP 94.23.XX.YYY:35685->74.125.XXX.YYY:993 (ESTABLISHED)
staging.p 6548 node   19u  IPv4 112416006      0t0  TCP 94.23.XX.YYY:35641->74.125.XXX.YYY:993 (ESTABLISHED)
staging.p 6548 node   20u  IPv4 112415956      0t0  TCP 94.23.XX.YYY:40954->74.125.XXX.YYY:993 (ESTABLISHED)
staging.p 6548 node   21u  IPv4 112415928      0t0  TCP 94.23.XX.YYY:40926->74.125.XXX.YYY:993 (ESTABLISHED)
…

You can already check if you get more connections than expected. The second step is to tie these connections to actual sockets in your app.

Parsing lsof result

lsof actually makes it very easy: the -F switch selects fields and outputs them in a machine-readable format. Then you can parse them with:

var cp = require('child_process');

function lsofImap(done) {
  cp.exec('lsof -i tcp:993 -n -P -F cpnT', function (err, data) {
    if (err)
      return done(err);

    var lines = data.split('\n'),
        res = [];

    var pid = 0,
        command = null,
        sockHash = null,
        tcpState = null,
        prefix = null;

    lines.forEach(function (line) {
      var s = line.slice(0, 1);
      line = line.slice(1);

      switch (s) {
        case 'p':
          pid = line;
          break;

        case 'c':
          command = line;
          break;

        case 'n':
          sockHash = line;
          break;

        case 'T':
          prefix = line.slice(0, 2);

          if (prefix === 'ST') {
            tcpState = line.slice(3);
            res.push({pid: pid, command: command, addr: sockHash, tcp_state: tcpState});
          }
          break;
      }
    });

    done(null, res);
  });
}

This will give you a set of objects like:

{
    pid: '28250',
    command: 'node',
    addr: '192.168.1.140:58479->74.125.XX.YYY:993',
    tcp_state: 'ESTABLISHED'
}

Linking lsof open connections with node.js sockets

Even though socket don't have nice, clean IDs, they all have something truly unique: the [src_ip, src_port, dst_ip, dst_port] tuple. We already retrieve it in lsof's output. It's very easy to get the same thing for node.js sockets:

function getSocketHash(socket) {
  if (!socket || !socket.address)
    return null;

  var addr = socket.address();

  if (!addr)
    return null;

  return addr.address + ':' + addr.port + '->' + socket.remoteAddress + ':' + socket.remotePort;
}

Finally, you just need to add a socket audit to your app, and you're done:

var _ = require('underscore'),
      socketAudit = {};

exports.reportSocket = function (inboxId, socket) {
  var hash = getSocketHash(socket);

  if (!hash)
    return;

  // mark previously active socket as leaking
  _(socketAudit).chain().where({inbox_id: inboxId, leaking: false}).each(function (info) {
    info.leaking = true;
    info.until = new Date();
  });

  socketAudit[hash] = {
    inbox_id: inboxId,
    leaking: false,
    since: new Date()
  };
};

Whenever I open a new connection for an IMAP inbox, I pass it to reportInbox. I keep a log of all previous connections.

Finally, I can just do:

lsofImap(function (err, sockets) {
  if (err)
    return done(err);

  _(sockets).each(function (socket) {
    var audit = socketAudit[socket.addr];

    if (audit) {
      _(['inbox_id', 'leaking', 'since', 'until']).each(function (field) {
        socket[field] = audit[field];
      });
    } else
      socket.orphan = true;
  });

  done(null, sockets);
});

It will give me a list of all open connections, with information pulled from the internal state of my node.js app. It will tell me about leaking connections, which inbox they involve and when they were supposed to terminate. Then, we can scour the logs to investigate what happened at that time that made the connection leak.

Fell free to ask for more code in the comments :)

5 Responses
Add your response

Did you find what was the case of your leaking sockets?

Currently I experience similar behavior - during the time I have living tcp connections that didn't close for unknown reason.

So they stacking and makes server to drop connections after about a day. Node.js cluster restart is helps but it isn't a solution.

over 1 year ago ·

@felikz

The most common reason is if you have a missing callback: you return from an async function without ever calling the callback and your connection just stays open "forever" (until a TCP timeout eventually occurs).

In my case, I had lots of persistent connections and the reconnection algorithm sometimes didn't close the previous connection.

over 1 year ago ·

@lperrin

Thanks for the response. But even if somewhere is a callback that didn't called, there is 2 minutes timeout:
http://nodejs.org/api/all.html#all_server_settimeout_msecs_callback

But the connections that I am facing, looks that they are living much longer.

over 1 year ago ·

@felikz

If they are managed by a driver, it might keep them open without your knowledge. What I do is that I wrap each socket in a domain (http://nodejs.org/api/domain.html) so I can have some context (why and when it was opened, when it's supposed to have closed, etc.). Then if I get leaked sockets, I can always pull the domain and see the history of the socket.

over 1 year ago ·

So if you interested in, the problem was that httpS server doesn't inherit sockets timeout logic from http server.

There is a patch merged since Apr 2013:
https://github.com/joyent/node/issues/5361

But it still doesn't merged to stable 0.10 branch.

The solutions:
- avoid https usage, use some proxy server (nginx)
- custom timeout handle
- use node version higher than 0.10

Related serverfault topic:
http://serverfault.com/questions/660248/nodejs-server-doesnt-close-tcp-connections

over 1 year ago ·