Running a large WebSocket server

This is a high-volume WebSocket server for my employer’s financial websites that implements a publish/subscribe hub for News, Blog comments, Chart pattern notifications, etc.:

WebSocket connections

  • One single Linux Virtual Machine
  • Node.js (event-driven programming)
  • Micheil Smith’s node-websocket-server module
  • No “wrapping library” like Socket.IO (to allow pure-WebSocket-clients)
  • Simple HTTP JSONP-polling fallback

Currently the server handles between 10.000 and 15.000 concurrent WebSocket connections, about 350.000 clients per day, without any problems using 100-300 MB RAM.

Challenges so far

  • High CPU usage for message routing (Subscriptions are MongoDB-like JSON query strings from clients, so for every published event the server has to iterate over every subscription)
  • Bugs in Node (sudden crashes due to HTTP parsing bugs) needed to be patched on every new Node release (seems to be fixed with Node 0.4.9)
  • Bugs in the WebSocket module (leaked connections which leads to memory leaks)
  • Linux network optimizations, some TCP settings needed to be tweaked
  • No working Daemon system for Node (forever is nice, but I did not manage to create working init.d scripts for Debian)
  • Broken clients (WebSockets or the Flash fallback do not work, broken Proxies, etc.)

Node seems to be an excellent platform for this kind of servers, as it’s event-based, fast and easy to deploy. Let’s hope, browsers move to the new generation of WebSocket protocols, so at least the protocol version mess comes to an end soon…

On the other hand, Socket.IO or SockJS look quite promising and both provide nice and transparent fallback options.

Does anyone have experience running a Socket.IO or SockJS server of this scale?