Embedding Bokeh server in a Django application with access to the ORM

Hi!

I have been following the example provided at bokeh/examples/howto/server_embed/django_embed at branch-3.0 · bokeh/bokeh · GitHub to embed a Bokeh server within a Django app. The plots produced in the example pull their data from other modules as a DataFrame. I would like to get the data for my plots from via the Django ORM but I’m having trouble getting this to work.

My Django code mostly follows the example above. The plotting code I have in the handler function begins:

from bokeh.document import Document
from app.models import Model

def plots_handler(doc: Document) -> None:
    obj = Model.objects.first()
    ...

The server appears to hang on this line. For some reason, I’m not getting any errors printed to the console (I am running this with python manage.py runserver). When I wrap it in a try/catch block and print the exception, I get You cannot call this from an async context - use a thread or sync_to_async.. I tried to follow the advice of the exception message and came across something in the channels docs that indicated I may need to do something like the following

from bokeh.document import Document
from channels.db import database_sync_to_async
from app.models import Model

@database_sync_to_async
def get_obj():
    return Model.objects.first()

async def plots_handler(doc: Document) -> None:
    obj = await get_obj()

But now the handler doesn’t seem to get called at all.

I’ve waded into some pretty unfamiliar territory here so any help would be greatly appreciated! Thanks.

@marxide This appears to be a bit more of a Django-related question. You might have better luck on Stack Overflow (for instance, I have no appreciable Django experience at all to offer any suggestions, tho others here may)

Hey,

I’ve been looking into this as well. bokeh + django ORM seems like a great combination to me so it seems worth the time to spend getting it working, as well as finding a maintainable way to have the two working together.

In your case @marxide , when you say

But now the handler doesn’t seem to get called at all.

The crude fix for this is to revert to your original example of non-async code (don’t try to use async functions), and then either:

  1. Set os.environ["DJANGO_ALLOW_ASYNC_UNSAFE"] = "true"

or

  1. Do django ORM stuff in a thread, as in this ugly example:
def handler(doc: Document) -> None:
    df = sea_surface_temperature.copy()
    source = ColumnDataSource(data=df)

    plot = figure(x_axis_type="datetime", y_range=(0, 25), y_axis_label="Temperature (Celsius)",
                  title="Sea Surface Temperature at 43.18, -70.43")
    plot.line("time", "temperature", source=source)

    div = Div(text="- -")

    def callback(attr: str, old: Any, new: Any) -> None:
        # still able to do bokeh stuff from the thread
        if new == 0:
            data = df
        else:
            data = df.rolling("{0}D".format(new)).mean()
        source.data = dict(ColumnDataSource(data=data).data)

        # able to do django ORM stuff without async-unsafe warning
        q=Question.objects.all().first()
        q.question_text = q.question_text + "--"
        q.save()
        # more bokeh interaction
        div.text = str(q.question_text)

    def callback_thread(attr, old, new):
        thread = Thread(target=callback, args=(attr, old, new))
        thread.start()
        thread.join()
    slider = Slider(start=0, end=30, value=0, step=1, title="Smoothing by N Days")
    slider.on_change("value", callback_thread)

That’s the “short” answer.


To go into a bit more justification. First of all, you probably don’t want to use (1). The django ORM code is marked “async unsafe.” Since the bokeh django embed runs under ASGI / in an async setting, this means as far as I can tell that calling the async unsafe code is unsafe for two reasons: a) if it does any blocking you will be stepping on the async event-loop and b) the async unsafe code may also use thread-local variables under the assumption of one-request-per-thread, meaning that in the single-threaded async/ASGI setting that sync accesses to the ORM, while they cannot preempt one another arbitrarily, may still be able to corrupt or clobber each other’s data during the life of a request. This can’t be solved by adding any locks or db transactions or anything on your own; it lies pretty deep within the django code. So as it says in the django docs, don’t use the (1) solution unless you are sure you only have one user at a time.

For (2) and in general… The problem (as far as I can see) with the attempt at switching to async and using sync_to_async is that the bokeh server code has async support, but then jumps back into calling sync code here:

Your handler is called in a sync context, i.e. it is not awaited, and so calling it just returns a future that is never consumed. That’s why it looks like it’s not getting called at all. (I have tested this; not with your code, but I tried the same thing as you)

This is why I arrived at solution (2), which is rather crude. What’s bad about it is that I want in the callback to first do some bokeh stuff, then do some django ORM stuff, then some more bokeh stuff. So I can’t just throw my ORM stuff into some background queue or worker (eg celery) to get it off of the main async thread. But since I am forced to write sync code, I can’t do the ideal thing which would be more like:

    async def callback(attr: str, old: Any, new: Any) -> None:
        # do bokeh stuff from within the main async thread
        if new == 0:
            data = df
        else:
            data = df.rolling("{0}D".format(new)).mean()
        source.data = dict(ColumnDataSource(data=data).data)

        # await ORM stuff which will happen in a thread
        @database_sync_to_async
        def orm_stuff():
            q=Question.objects.all().first()
            q.question_text = q.question_text + "--"
            q.save()
            return q.question_text
        qt = await orm_stuff()

        # now i am back in the main async thread
        div.text = str(qt)

    slider = Slider(start=0, end=30, value=0, step=1, title="Smoothing by N Days")
    slider.on_change("value", callback)

I’m not sure how much of an effort this would be (or if it is really possible) in the bokeh server code. When I have a chance I might tinker with naively trying to convert the necessary bokeh methods to async, add the necessary awaits, and see if it just works. Then optimistically one could figure out how to upstream this without duplicating code (so the sync call-path can still live alongside the async one).

This is all a bit speculatory since I haven’t looked deeply into the code yet; maybe someone from bokeh can weigh in. But the thread solution I propose in (2) works, and I believe is safe for if you have concurrent users. Although it could result in concurrent (multi-threaded) accesses to bokeh models (within a bokeh session / Document) which may be safe for concurrent users on the server, but still might in terms of rapid-fire or concurrent events originating from the client (say, just moving the slider around in this example). This is why it would be nice to get the full async approach working, because then you’d have the usual single-thread async guarantee, and just have to refrain from doing long CPU-bound or blocking operations in your handler – just await a task on a thread which reads a pandas dataframe or whatever you need to do.

Chris

2 Likes

Have to revise what I said about my above solution (2) being “safe.” It does still have the problem of doing a blocking thread.join() on the async main thread.

I did a quick test of marking some bokeh methods async and adding awaits, but there is at least one sync call path that then calls those methods and so the request ends up hanging.

As a sanity check-- I assume it is not possible to do the bokeh server embed into WSGI django. Since bokeh is written for tornado / async in general, right?

I don’t quite understand why there are so many non async methods in the bokeh code. I am more familiar with twisted than asyncio… so I don’t know maybe it is normal in an asyncio codebase to only mark methods as async when they have to do an await in their body… In that case maybe it wouldn’t be too invasive to async-ify bokeh all the way through to where the django views are called. I think this would also mean that attaching of callbacks to bokeh models/events would have to accept async callables; not sure of the implications there.

Having native async methods at all is literally less than a month old. Python 2 support was only dropped one month ago. FWIW I would very much like to elevate the Bokeh protocol to be generally usable on its own, e.g. in an ASGI context, or bare asycio app, or whatever. And at the same time, move to having “the tornado Bokeh server” become more of “a reference Bokeh server”. This is an area where some new interested contributors who bring their own expertise and POV, could have a huge outsized effect on Bokeh development in the short term.

Just to make sure anyone visiting this thread down the line isn’t led astray, I think this is the executive summary. I just was doing a lot of self-experimentation and don’t want to give the impression that this isn’t covered in the docs.

  1. For the embedding of bokeh into django, you can’t make your handler async. (this isn’t in the docs, but it should be clear that deviating significantly from the django_embed howto might not work) (edit: and if you’re following django development it should be clear since async views did not make it into the recent django 3 release)

  2. Strategies for dealing with the django ORM warning You cannot call this from an async context - use a thread or sync_to_async.:

    a. Use os.environ["DJANGO_ALLOW_ASYNC_UNSAFE"] = "true". Only recommended say if you are just doing single-user stuff from a Jupyter notebook. See the django page on Asynchronous Support (sorry just google it-- discourse is only letting me put 2 links in a post because of being a “new user”)

    b. Do blocking stuff in a thread and any follow-up bokeh operations in a next tick callback as described in Updating from Threads.

    c. Even though as I say in (1) I don’t think you can have an async handler, you can get back into doing async stuff if you want, since bokeh lets you write either sync or async callbacks. Then you can do your blocking work by yielding your task to a threadpool. This is all covered in Updating from Unlocked Callbacks.

@Bryan I’m pretty pleased with the async experience now that I have figured it out.

@without_document_lock
@coroutine
def callback():
    @sync_to_async
    def blocking_function():
        # ...
    yield blocking_function()

    def doc_modifying_function(doc):
        # ...
    yield doc.session_context.with_locked_document(_doc_modifying_function)

    # repeat either of the above as-needed

button = Button(label="Button")
button.on_click(callback)

A bit verbose but the need for @sync_to_async’d functions can be reduced by using full-async or async compatibility bindings for whatever blocking stuff you are doing.

Then the document lock situation contributes the other bit of verbosity; not a big deal in my view but of course it would be nice if it were gone.

As long as it works, which it definitely does!

Then also the option of doing

def my_django_view(doc: Document):
    @coroutine
    def async_from_view():
        # ...
    doc.add_next_tick_callback(async_from_view)

in order to get into async-land from the view itself. edit: I don’t know why I thought it would be a good idea to write this without testing it first. This example directly above (async_from_view) doesn’t actually work on my first test.

Re:

FWIW I would very much like to elevate the Bokeh protocol to be generally usable on its own, e.g. in an ASGI context, or bare asycio app, or whatever. And at the same time, move to having “ the tornado Bokeh server” become more of “ a reference Bokeh server”. This is an area where some new interested contributors who bring their own expertise and POV, could have a huge outsized effect on Bokeh development in the short term.

My current standing is that the project I’m using bokeh+django for is relatively new, so I’m still at the level of investment of monkeypatch anything critical for my use-case, and then submit it upstream hopefully if it is not too crude.

That said do you have somewhere I could follow along on the development efforts? Maybe I could keep an eye out for some work that would complement the work I am doing day-to-day and eventually jump in.