I’m considering to use Bokeh for some of our more complex visualizations within a non-Python web application. What I understand is that there are 2 options for embedding:
standalone documents
Bokeh applications running on a Bokeh Server
I think I like Bokeh Server most because, as far as I understand, in that case our Python data scientists can also build in Bokeh the interactions to query the data. But I wonder if it also meets our other requirements:
it’s important that access is restricted to authenticated users, these users are allowed to see only their own data, so we need a mechanism to pass through authorized access. Authorization within our platform is enforce by OpenID JWT tokens. For me it’s not clear how to realize authorization in combination with Bokeh server websocket sessions. Is this were bokeh.client is for?
Bokeh Server serves documents for every client, the source data for our Bokeh app is around 100MB, what does this mean for our resources when we have, for instance, between 100 and 500 concurrent users?
Any help or hints to make a good choice between Bokeh server and stand alone documents is highly appreciated!
No idea how to work with JWT here, but for the generic case of regular sessions and a cookie with a session ID, I would generate a Bokeh session ID on the backend when a user is authenticated and save this ID in a cookie and somewhere on the backend. On the frontend, I’d call add_document_from_session with that session ID. Then, the backend could access that session ID via an instance of the Document and check the user’s authorization against it.
A relevant bokeh serve option, if you want to research this further:
--session-ids MODE One of: unsigned, signed or external-signed
Data size
Well, it doesn’t matter what you use if you really send out a 100MB document to each user. You will have problems with both standalone and served documents if you have 100-500 concurrent users.
But do you really send that much data? I understand if you have that much data. But Bokeh doesn’t send it all if you don’t give it all to it.
If you really do that, how does your UI look? How responsive is it?
Unfortunately things really always come down to specifics. I’ll try to offer a few comments, though.
Auth
bokeh.client is a way for Bokeh server sessions to be connected to or initiated from a Python process, as opposed to being initiated by an HTTP connection from a browser. This can be useful for auth in the context of embedding a Bokeh app in another (e.g. Flask) web app. The Flask process and the Bokeh server process share a secret. The Bokeh server can be configured to only open sessions for session ids signed with the secret. Since only the Flask app has the secret, only the Flask app can successfully create and embed sessions. You can put the Flask code behind whatever Flask auth scheme you need/want.
A newer option, if you just want to run a Bokeh server by itself, is “auth hooks”. These allow you to accept/reject incoming HTTP(s) requests to the Bokeh server. Here is a simple example using a basic bearer auth header:
In practice you’d want to implement something more robust, e.g.OAuth, etc. The auth hooks give you the place to do that.
That said, you could also use the auth hooks in conjunction with a Flask embed too, and in fact the auth hooks are probably exactly where you would want to intercept an inspect a JWT token. But in this case you could potentially use the simpler server_document I think, instead of having to create a manual session with bokeh.client and embed it with server_session. (Some exploration around this might be needed)
Also to note tangentially: Bokeh 2.0 is coming out soon and there are some notable improvements you will probably want:
SRI hashes for loading CDN resources
better support for making headers/cookies directly available to app code
Bokeh’s will use internal JWT tokens for authorizing the websocket connection
these can be be signed, like session ids
can have an expiry
will not be sent via query arguments (so won’t end up in logs)
Data
Do you mean all clients need read-access to the same 100mb data store? Do they all need all of it all the time? Or they all need access to different parts? Or do you mean each client has potentially its own unique 100mb data set? There’s lots of possibilities and details matter here.
@Bryan we (Mathijs and I are colleagues) are considering implementing a temporary solution, aiming to use the JWT functionality in 2.0 for the final release. Could you give any indication what ‘soon’ would be? Looking at GH, I see the milestone is 95% completed, with 7 open issues. E.g. ‘anytime this year’ would already give us some guidance as to guidance. Of course no strings attached, open source being best effort and all. Will have a look whether I could pick up one of the open tickets myself (if it’s not out of my league skills-wise).