Nginx / bokeh load balancing issue

Hi,

First of all I would like to say that I am quite new to bokeh and nginx.

I have created a simple bokeh that reads in data, performs simple analysis and plots the results. However, the data import takes few seconds and when multiple users are trying to access the webapp at the same time they have to wait until the bokeh server process their request.

My user base has grown and I received quite few requests to address this. One way that I have been trying to apply is by using nginx as a load balancer.

I have used a set up that is similar to the one recommended on https://bokeh.pydata.org/en/latest/docs/user_guide/server.html#load-balancing-with-nginx

My docker-compose file looks as follows:

version: '3'

services:

  upstream:
    build: nginx
    networks:
      webapp-net:
    ports:
      - "80:80"

  webapp0:
    build: ../.
    networks:
      webapp-net:
    ports:
      - "5100:5100"
    command: [
      "/webapp/server.sh", "5100"
    ]

  webapp1:
    build: ../.
    networks:
      webapp-net:
    ports:
      - "5101:5101"
    command: [
      "/webapp/server.sh", "5101"
    ]

networks:
  webapp-net:

My nginx configuration looks as follows:

nginx: nginx.conf

user nginx;
worker_processes auto;

events {
    worker_connections 768;
}

http {
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65;
    proxy_read_timeout 200;
    types_hash_max_size 2048;

    include /etc/nginx/mime.types;
    default_type application/octet-stream;

    access_log /dev/stdout;
    error_log  /dev/stdout;

    # Increase POST body size
    client_max_body_size 100M;

    # Include additional configuration files
    include /etc/nginx/sites-enabled/*;
}

nginx: sites-enabled/default

upstream backend_servers {
    #least_conn;               # Use Least Connections strategy
    server webapp0:5100;      # Bokeh Server 0
    server webapp1:5101;      # Bokeh Server 1
}

server {
    #listen 80 default_server;
    server_name _;

    access_log  /tmp/bokeh.access.log;
    error_log   /tmp/bokeh.error.log debug;

    location / {
        proxy_pass http://backend_servers;

        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_http_version 1.1;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host:$server_port;

        proxy_buffering off;
    }
}

The server.sh script is simply:

#!/usr/bin/env bash

bokeh serve --port $1 src/bokeh_server.py \
    --allow-websocket-origin='*'

When I try to simulate simultaneous access to the webapp I can see that the requests are processed one by one:

Creating network "docker-compose_webapp-net" with the default driver
Creating docker-compose_webapp1_1  ... done
Creating docker-compose_webapp0_1  ... done
Creating docker-compose_upstream_1 ... done
Attaching to docker-compose_webapp1_1, docker-compose_webapp0_1, docker-compose_upstream_1
webapp1_1   | 2019-08-27 15:37:21,787 Starting Bokeh server version 1.3.4 (running on Tornado 6.0.3)
webapp1_1   | 2019-08-27 15:37:21,788 Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitly
webapp1_1   | 2019-08-27 15:37:21,790 Bokeh app running at: http://localhost:5101/bokeh_server
webapp1_1   | 2019-08-27 15:37:21,791 Starting Bokeh server with process id: 7
webapp0_1   | 2019-08-27 15:37:21,805 Starting Bokeh server version 1.3.4 (running on Tornado 6.0.3)
webapp0_1   | 2019-08-27 15:37:21,806 Host wildcard '*' will allow connections originating from multiple (or possibly all) hostnames or IPs. Use non-wildcard values to restrict access explicitly
webapp0_1   | 2019-08-27 15:37:21,809 Bokeh app running at: http://localhost:5100/bokeh_server
webapp0_1   | 2019-08-27 15:37:21,809 Starting Bokeh server with process id: 8
webapp0_1   | 2019-08-27 15:37:31,425 302 GET / (172.21.0.3) 0.93ms
webapp0_1   | 2019-08-27 15:37:31,498 302 GET / (172.21.0.3) 0.63ms
webapp1_1   | 2019-08-27 15:37:33,632 200 GET /bokeh_server (172.21.0.3) 2174.15ms
webapp1_1   | 2019-08-27 15:37:33,634 302 GET / (172.21.0.3) 0.71ms
webapp1_1   | 2019-08-27 15:37:33,642 302 GET / (172.21.0.3) 1.11ms
webapp0_1   | 2019-08-27 15:37:35,062 200 GET /bokeh_server (172.21.0.3) 1425.93ms
webapp0_1   | 2019-08-27 15:37:35,064 302 GET / (172.21.0.3) 0.63ms
webapp0_1   | 2019-08-27 15:37:35,068 302 GET / (172.21.0.3) 0.61ms
webapp1_1   | 2019-08-27 15:37:36,270 200 GET /bokeh_server (172.21.0.3) 1204.42ms
webapp1_1   | 2019-08-27 15:37:36,273 302 GET / (172.21.0.3) 1.17ms
webapp1_1   | 2019-08-27 15:37:36,278 302 GET / (172.21.0.3) 0.95ms
webapp0_1   | 2019-08-27 15:37:37,443 200 GET /bokeh_server (172.21.0.3) 1170.69ms
webapp0_1   | 2019-08-27 15:37:37,444 302 GET / (172.21.0.3) 0.55ms
webapp1_1   | 2019-08-27 15:37:37,450 302 GET / (172.21.0.3) 1.64ms
webapp1_1   | 2019-08-27 15:37:37,454 302 GET / (172.21.0.3) 0.71ms
webapp0_1   | 2019-08-27 15:37:38,651 200 GET /bokeh_server (172.21.0.3) 1200.55ms
webapp0_1   | 2019-08-27 15:37:38,653 302 GET / (172.21.0.3) 0.56ms
webapp0_1   | 2019-08-27 15:37:38,659 302 GET / (172.21.0.3) 0.64ms
webapp1_1   | 2019-08-27 15:37:39,821 200 GET /bokeh_server (172.21.0.3) 1167.23ms
webapp1_1   | 2019-08-27 15:37:39,823 302 GET / (172.21.0.3) 0.64ms
webapp0_1   | 2019-08-27 15:37:39,828 302 GET / (172.21.0.3) 0.59ms
webapp0_1   | 2019-08-27 15:37:39,835 302 GET / (172.21.0.3) 1.10ms
webapp1_1   | 2019-08-27 15:37:41,038 200 GET /bokeh_server (172.21.0.3) 1208.31ms
webapp1_1   | 2019-08-27 15:37:41,040 302 GET / (172.21.0.3) 0.61ms
webapp1_1   | 2019-08-27 15:37:41,046 302 GET / (172.21.0.3) 0.83ms
webapp0_1   | 2019-08-27 15:37:42,265 200 GET /bokeh_server (172.21.0.3) 1223.21ms
webapp0_1   | 2019-08-27 15:37:42,267 302 GET / (172.21.0.3) 0.58ms
webapp1_1   | 2019-08-27 15:37:42,271 302 GET / (172.21.0.3) 0.57ms
webapp1_1   | 2019-08-27 15:37:42,277 302 GET / (172.21.0.3) 0.67ms
webapp0_1   | 2019-08-27 15:37:43,418 200 GET /bokeh_server (172.21.0.3) 1143.51ms
webapp0_1   | 2019-08-27 15:37:43,421 302 GET / (172.21.0.3) 0.85ms
webapp0_1   | 2019-08-27 15:37:43,427 302 GET / (172.21.0.3) 0.76ms
webapp1_1   | 2019-08-27 15:37:44,605 200 GET /bokeh_server (172.21.0.3) 1183.26ms
webapp1_1   | 2019-08-27 15:37:44,607 302 GET / (172.21.0.3) 0.58ms
webapp0_1   | 2019-08-27 15:37:45,797 200 GET /bokeh_server (172.21.0.3) 1189.02ms
webapp1_1   | 2019-08-27 15:37:47,017 200 GET /bokeh_server (172.21.0.3) 1212.78ms
webapp0_1   | 2019-08-27 15:37:48,200 200 GET /bokeh_server (172.21.0.3) 1178.95ms
webapp1_1   | 2019-08-27 15:37:49,380 200 GET /bokeh_server (172.21.0.3) 1172.22ms
webapp0_1   | 2019-08-27 15:37:50,553 200 GET /bokeh_server (172.21.0.3) 1170.79ms
webapp1_1   | 2019-08-27 15:37:51,710 200 GET /bokeh_server (172.21.0.3) 1154.01ms
webapp0_1   | 2019-08-27 15:37:52,815 200 GET /bokeh_server (172.21.0.3) 1099.60ms
webapp1_1   | 2019-08-27 15:37:53,935 200 GET /bokeh_server (172.21.0.3) 1117.51ms
webapp0_1   | 2019-08-27 15:37:55,137 200 GET /bokeh_server (172.21.0.3) 1194.86ms
webapp1_1   | 2019-08-27 15:37:56,344 200 GET /bokeh_server (172.21.0.3) 1203.75ms
webapp0_1   | 2019-08-27 15:37:57,565 200 GET /bokeh_server (172.21.0.3) 1218.49ms
webapp1_1   | 2019-08-27 15:37:58,756 200 GET /bokeh_server (172.21.0.3) 1184.73ms
webapp0_1   | 2019-08-27 15:37:59,943 200 GET /bokeh_server (172.21.0.3) 1185.49ms
webapp1_1   | 2019-08-27 15:38:01,129 200 GET /bokeh_server (172.21.0.3) 1180.08ms

Does anyone have any suggestions why the load balancing (asynchronous processing of requests) does not work?

Are you certain it is not working? it looks like the /bokeh_server replies take ~1 second but you have sent ~30 requests at once with only two backend servers running. Things will pile up in that situation.

What you should look at to confirm is the logs from the bokeh servers themselves. Are they both reporting connections? Then load balancing is working. Whether only two backend servers is enough to help much in your situation is a doifferent question, and depends on the specifics of your app and your traffic.

If you are not on windows, you might also look at --num-procs which uses Tornado’s built in capability to run multiple processes. This could be used together with nginx, or potentially instead of nginx.

In general to improve responsiveness for mulitple users at a time you want to try and avoid blocking work on the main app loop. See e.g. Updating from Threads for information about how to use threads to handle blocking work.

Thank you for your reply Bryan.

It does look as the balancing is sort works but not in the way that I am expecting. It seems that the requests all dealt with one by one by sending them to different servers, however, I would expect that at least two requests should be processed simultaneously as I have two servers running.

@tomaslaz The requests come in to Nginx one by one, is that what you are concerned about? Nginx is very fast at what it does, it should not be any sort of bottleneck. The requests are very quickly routed quickly to the backend servers to handle. If both Bokeh servers are reporting activity, then they should be handling their individual requests simultaneously. What makes you think they are not?

Like I said before, two servers may simply not be enough. If you are doing lots of blocking work on Bokeh app session startup, that will block other session creations while it runs. If each startup takes ~1 second and you spam 30 session requests across two servers, then some requests are still going to take tens of seconds to start. The only solutions are:

  • more backend servers
  • move the blocking work to a thread.

There is actually an open issue about having session creation happen automatically happen on another thread, but I am not sure when it will land.

@Bryan thank you for the reply.

If both Bokeh servers are reporting activity, then they should be handling their individual requests simultaneously. What makes you think they are not?

When I look at the log and if I ignore the 302 messages, I can see:

webapp1_1   | 2019-08-27 15:37:33,632 200 GET /bokeh_server (172.21.0.3) 2174.15ms
webapp0_1   | 2019-08-27 15:37:35,062 200 GET /bokeh_server (172.21.0.3) 1425.93ms
webapp1_1   | 2019-08-27 15:37:36,270 200 GET /bokeh_server (172.21.0.3) 1204.42ms
webapp0_1   | 2019-08-27 15:37:37,443 200 GET /bokeh_server (172.21.0.3) 1170.69ms
webapp0_1   | 2019-08-27 15:37:38,651 200 GET /bokeh_server (172.21.0.3) 1200.55ms
webapp1_1   | 2019-08-27 15:37:39,821 200 GET /bokeh_server (172.21.0.3) 1167.23ms
webapp1_1   | 2019-08-27 15:37:41,038 200 GET /bokeh_server (172.21.0.3) 1208.31ms
webapp0_1   | 2019-08-27 15:37:42,265 200 GET /bokeh_server (172.21.0.3) 1223.21ms
webapp0_1   | 2019-08-27 15:37:43,418 200 GET /bokeh_server (172.21.0.3) 1143.51ms
webapp1_1   | 2019-08-27 15:37:44,605 200 GET /bokeh_server (172.21.0.3) 1183.26ms
webapp0_1   | 2019-08-27 15:37:45,797 200 GET /bokeh_server (172.21.0.3) 1189.02ms
webapp1_1   | 2019-08-27 15:37:47,017 200 GET /bokeh_server (172.21.0.3) 1212.78ms
webapp0_1   | 2019-08-27 15:37:48,200 200 GET /bokeh_server (172.21.0.3) 1178.95ms
webapp1_1   | 2019-08-27 15:37:49,380 200 GET /bokeh_server (172.21.0.3) 1172.22ms
webapp0_1   | 2019-08-27 15:37:50,553 200 GET /bokeh_server (172.21.0.3) 1170.79ms
webapp1_1   | 2019-08-27 15:37:51,710 200 GET /bokeh_server (172.21.0.3) 1154.01ms
webapp0_1   | 2019-08-27 15:37:52,815 200 GET /bokeh_server (172.21.0.3) 1099.60ms
webapp1_1   | 2019-08-27 15:37:53,935 200 GET /bokeh_server (172.21.0.3) 1117.51ms
webapp0_1   | 2019-08-27 15:37:55,137 200 GET /bokeh_server (172.21.0.3) 1194.86ms
webapp1_1   | 2019-08-27 15:37:56,344 200 GET /bokeh_server (172.21.0.3) 1203.75ms
webapp0_1   | 2019-08-27 15:37:57,565 200 GET /bokeh_server (172.21.0.3) 1218.49ms
webapp1_1   | 2019-08-27 15:37:58,756 200 GET /bokeh_server (172.21.0.3) 1184.73ms
webapp0_1   | 2019-08-27 15:37:59,943 200 GET /bokeh_server (172.21.0.3) 1185.49ms
webapp1_1   | 2019-08-27 15:38:01,129 200 GET /bokeh_server (172.21.0.3) 1180.08ms

The time difference between the messages is almost identical ~1.2s an they are received one after another. What I expected to see is something as follows:

webapp1_1 | 2019-08-27 15:37:33,000 200 GET /bokeh_server (172.21.0.3) 2000ms
webapp0_1 | 2019-08-27 15:37:33,000 200 GET /bokeh_server (172.21.0.3) 2000ms
webapp1_1 | 2019-08-27 15:37:34,000 200 GET /bokeh_server (172.21.0.3) 2000ms
webapp0_1 | 2019-08-27 15:37:34,000 200 GET /bokeh_server (172.21.0.3) 2000ms

Where two server creating sessions simultaneously.

Like I said before, two servers may simply not be enough.

I have tried using 8 servers but I got almost identical performance as with one or two servers.

As I said initially, I am quite new to Bokeh and Nginx, but it feels as the blocking work done by one the servers blocks the work for the other servers. Is this possible?

I don’t personally see how it is possible for the HTTP requests to block each other in any way. This might be a better question for an Nginx forum at this point, though.

@tomaslaz perhaps the number of nginx worker processes needs to be explicitly raised from the default of one:

@Bryan I did try increasing the number of nginx workers but it didn’t help.

OK, I am afraid that exhausts my Nginx knowledge then. I would definitely see about asking on a more Nginx-centered forum to see if someone with more expertise might might know what could cause this situation, even with the workers increase. Please report back if you hear something, we can update our guidance appropriately.