Load balancing / auto-scaling a bokeh server: Is there server-side user data?

Hello,
I have a question I couldn’t clarify from the documentation: https://docs.bokeh.org/en/latest/docs/user_guide/server.html#load-balancing-with-nginx

I’d like to set up auto-scaling for our bokeh application and run it behind an AWS load balancer. Can the traffic be routed to any random bokeh server? Or does each user/session/document create server-side user data which would be missing when sending the next request to a different server?

https://docs.bokeh.org/en/latest/docs/user_guide/concepts.html : “Documents contain all the Bokeh Models and data needed to render an interactive visualization or application in the browser.” Where are documents stored? I have to admit I don’t yet understand the difference between documents and sessions.

Judging from this pictogram: https://docs.bokeh.org/en/latest/docs/user_guide/server.html#building-bokeh-applications Each user gets their own document on the server which is synced to the browser ?

Thanks!

@Johannes a Document is actual data structure that is synchronized between Python and the browser. A Session is basically just a name for the machinery that handles synchronizing one particular Document instance. As long as a Session is open, its Document will continue to be synchronized. Once it is closed, no changes from the browser will affect the server (or vice versa). Bokeh does not currently have any capability to re-open closed sessions.

As far as routing/load balancing is concerned, immediately after an initial HTTP request, all traffic is subsequently only handled over a websocket. Bokeh itself can already accommodate dealing with the case where the HTTP request lands on one server and the websocket lands on a different one. (Though if you can configure your proxy to send them to the same server, e.g I think nginx has a notion of “sticky” sessions, then that is advised).

1 Like

Thank you for your reply.

I can definitely enable ‘sticky sessions’ on the AWS LB. So if I understand you correctly (and my conclusions):

  1. the HTTP request can in principle go to any target, it could even go to a random target for every new request, but using sticky behavior is advised.
  2. for the WebSocket connection one has to insure it always goes to the same target because that’s where the Document instance lives (I don’t know WebSocket connections, this might be an implicit requirement of the implementation?).
  3. if a target is removed because it’s unhealthy or is down-scaled (not sure what the behavior is when there are still active WebSocket connections, probably it would only be stopped once all connections are drained), BokehJS would display the ‘connection disconnected’ entry in browser log.
  4. Sticky sessions is advisable because it would insure one user can only ‘block’ one target, which simplifies scaling. Also if a target becomes unhealthy, the user will be routed to a healthy target and a new document/session will be created.

@Johannes not quite, there is one HTTP request (and response) and then one websocket connection. When I say stickiness I mean it’s ideal if the http:// request and the subsequent ws:// request land on the same server. [1]


  1. Bokeh will make things work even if that is not the case, but if the HTTP and websocket requests land on different servers, the penalty is that the app code gets executed twice (once on the HTTP request to generated the page template, and again on the websocket connection because the live synced document has to live where the websocket connection is). ↩︎