Reverse proxy configuration for flask_gunicorn_embed.py example

I’ve been having a bit of trouble adapting the flask_gunicorn_embed.py example from the bokeh github to work behind an nginx reverse proxy. I am currently running two Amazon EC2 instances, one for the nginx reverse proxy and another for the backend.

As I understand it, a key principle in the example is that the ports are randomly designated to each of the gunicorn workers that are initiated. As such the server_document function will return differing script blocks based on the port number.

@app.route('/', methods=['GET'])
def bkapp_page():
    script = server_document('http://localhost:%d/bkapp' % port)
    return render_template("embed.html", script=script, template="Flask")

As I understand it, to access the plot on my browser outside of the EC2 instances, the server_document call must use the address of the nginx server. As I want to limit the number of ports open on this server I want this traffic to go through port 80 but I also need to pass the corresponding gunicorn worker port.

I thought I could do this by passing it through in the header and having nginx redirect it so I modified it as such.

@app.route('/bk', methods=['GET'])
        def bkapp_page():
            script = server_document('http://XXX-webserverIP-XXX/forward/bkapp' ,relative_urls=True, headers={'bokeh-port': port})
            return render_template("embed.html", script=script, template="Flask")

def bk_worker():
            asyncio.set_event_loop(asyncio.new_event_loop())

            bokeh_tornado = BokehTornado({'/bkapp': bkapp},prefix='forward', extra_websocket_origins=["XXX-webserverIP-XXX:80"])
            bokeh_http = HTTPServer(bokeh_tornado)
            bokeh_http.add_sockets(sockets)

            server = BaseServer(IOLoop.current(), bokeh_tornado, bokeh_http)
            server.start()
            server.io_loop.start()

nginx config block

        location /forward/ {
                resolver 10.0.0.2;
                proxy_pass http://XXX-backendIP-XXX:$http_bokeh_port$request_uri;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection "upgrade";
                proxy_http_version 1.1;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header Host $host:$server_port;
                proxy_buffering off;
        }

The js call is returning status 200 which suggests that putting the port in the header and redirecting with nginx is working. This seems to throw an error because the websocket connection doesn’t have the port in the header which I confirmed in the nginx access log.

What I’ve noticed is that if I set the bind_sockets call in flask_gunicorn_embed.py to fixed port 5006 and hardcode the port in the nginx conf, then everything works perfectly.

        location /forward/ {
                resolver 10.0.0.2;
                proxy_pass http://XXX-backendIP-XXX:5006;
                proxy_set_header Upgrade $http_upgrade;
                proxy_set_header Connection "upgrade";
                proxy_http_version 1.1;
                proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
                proxy_set_header Host $host:$server_port;
                proxy_buffering off;
        }

At this point I would like to know if there is anyway I can set a custom header in the ws://xxxxxx/ws connection call so that the websocket connection can also be forwarded.

As I’m rather new to all of this, I’m well aware that this might be a silly way of accomplishing this. If there are better ways to do this or best practices that I’m unaware of, I’m all ears.

Unless I am misunderstanding the question, then no. AFAIK the only “user-configurable” header that the websocket protocol supports is Sec-WebSocket-Protocol and Bokeh already uses that for transmitting session tokens.

Is there a reason you want to use the gunicorn approach specifically? More typically people probably run multiple server instances behind a load balancing reverse proxy like nginx or Apace.

1 Like

Thanks for the prompt response Bryan.

No reason specifically, I don’t really know better so I appreciate your suggestion of what people generally run.

Just to confirm that I understand you correctly, that would mean that each backend server instance would only run one bokeh server to which I would open on fixed ports?

To improve performance, would there still be any benefit of running multiple gunicorn workers and/or using num_proc > 1 for the bokeh server? I admit that I don’t have a deep understanding of how this all ties together and would appreciate any links on appropriate resources.

You could also run several processes on different ports on a single machine, nginx could be configured to round-robin them.

To improve performance, would there still be any benefit of running multiple gunicorn workers and/or using num_proc > 1 for the bokeh server? I

I don’t really have any experience or knowledge about gunicorn beyond the barest basics, so I can’t really comment. Using --num-procs might be a way to have a single invocation without the hassle of a proxy in front, but you’ll have to try it out to see if it suits your needs.

1 Like