[bokeh] Bokeh server app slow

Hi John,

There are a couple of features in the pipeline that may help with apps like you describe:

* protocol messages and features for noting "busy" (e.g. a spinner, etc.)
* a binary array protocol for more efficient transfer of array data

But I suspect that you might simply be bumping up against (or pushing past) the current limits of the server. It's possible that there is a resource leak (in either the server, OR the client --- you might check memory consumption over time for both) and that the scale of your app is uncovering it quickly.

Can you can share example code to reproduce the issue (perhaps with synthesized or fake data)? It is possible that a review of the code will suggest some change in usage or optimization that will help immediatey, but more likely it would simply be a great help for investigating any server problems that need addressing.

Thanks,

Bryan

···

On Jun 30, 2016, at 6:41 AM, John.B <[email protected]> wrote:

Hi,

I have a bokeh app that is running very slow. The app consists of around several plots (20-50k pts in total), a couple of tables (5k rows x 10 columns), some charts and some widgets to help interact with the widgets. Currently what I experience is that on the start up of the app it is quite responsive, but the longer the server has been running the slower it gets taking up to several mins for a user interaction to run its callback and the corresponding output to be generated in the app. Curiously if the app is restarted the speed improves again even though the amount of data is the same.

I am not that experienced in profiling but from what I can tell my end of the code appears to be quite fast and its whatever bokeh is doing on the backend that appears to take most of the time. Are there any good tools that I can use to profile a bokeh app? Unfortunately I cant share the code but if I can isolate the problem I can try and provide a test case that reproduces the problem.

Also I tried upgrading to 0.12 and this seems to have accentuated the issue.

Thank
John

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/5767c0f1-cc88-4139-87d3-294ba269797c%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

However, regarding profiling, this tool was suggested:

  https://github.com/rkern/line_profiler

You might first try decorating some of the "json patch" related methods on Document and sharing the results:

  https://github.com/bokeh/bokeh/blob/master/bokeh/document.py

Thanks,

Bryan

···

On Jun 30, 2016, at 8:56 AM, Bryan Van de Ven <[email protected]> wrote:

Hi John,

There are a couple of features in the pipeline that may help with apps like you describe:

* protocol messages and features for noting "busy" (e.g. a spinner, etc.)
* a binary array protocol for more efficient transfer of array data

But I suspect that you might simply be bumping up against (or pushing past) the current limits of the server. It's possible that there is a resource leak (in either the server, OR the client --- you might check memory consumption over time for both) and that the scale of your app is uncovering it quickly.

Can you can share example code to reproduce the issue (perhaps with synthesized or fake data)? It is possible that a review of the code will suggest some change in usage or optimization that will help immediatey, but more likely it would simply be a great help for investigating any server problems that need addressing.

Thanks,

Bryan

On Jun 30, 2016, at 6:41 AM, John.B <[email protected]> wrote:

Hi,

I have a bokeh app that is running very slow. The app consists of around several plots (20-50k pts in total), a couple of tables (5k rows x 10 columns), some charts and some widgets to help interact with the widgets. Currently what I experience is that on the start up of the app it is quite responsive, but the longer the server has been running the slower it gets taking up to several mins for a user interaction to run its callback and the corresponding output to be generated in the app. Curiously if the app is restarted the speed improves again even though the amount of data is the same.

I am not that experienced in profiling but from what I can tell my end of the code appears to be quite fast and its whatever bokeh is doing on the backend that appears to take most of the time. Are there any good tools that I can use to profile a bokeh app? Unfortunately I cant share the code but if I can isolate the problem I can try and provide a test case that reproduces the problem.

Also I tried upgrading to 0.12 and this seems to have accentuated the issue.

Thank
John

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/5767c0f1-cc88-4139-87d3-294ba269797c%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Hi Bryan,

memory consumption definitely doesn’t seem to be an issue with the server and client only consuming ~200MB each.

As far as I can tell the app seems to lock up over time so responsiveness to user interactions are slow eg changing plot tools can take a while. The execution of my callback though is quiet fast but again pushing this back to the client takes ages. This is all run on a local machine running windows 7.

I’ve also noticed that periodic callbacks seem reasonably fast but I cant really tell without being able to profile it.

Is there anyway to determine what is taking up most of the time? if not ill try and make a simple version that reproduces the problem.

Thanks

John

···

On Thursday, June 30, 2016 at 11:56:15 PM UTC+10, Bryan Van de ven wrote:

Hi John,

There are a couple of features in the pipeline that may help with apps like you describe:

  • protocol messages and features for noting “busy” (e.g. a spinner, etc.)

  • a binary array protocol for more efficient transfer of array data

But I suspect that you might simply be bumping up against (or pushing past) the current limits of the server. It’s possible that there is a resource leak (in either the server, OR the client — you might check memory consumption over time for both) and that the scale of your app is uncovering it quickly.

Can you can share example code to reproduce the issue (perhaps with synthesized or fake data)? It is possible that a review of the code will suggest some change in usage or optimization that will help immediatey, but more likely it would simply be a great help for investigating any server problems that need addressing.

Thanks,

Bryan

On Jun 30, 2016, at 6:41 AM, John.B [email protected] wrote:

Hi,

I have a bokeh app that is running very slow. The app consists of around several plots (20-50k pts in total), a couple of tables (5k rows x 10 columns), some charts and some widgets to help interact with the widgets. Currently what I experience is that on the start up of the app it is quite responsive, but the longer the server has been running the slower it gets taking up to several mins for a user interaction to run its callback and the corresponding output to be generated in the app. Curiously if the app is restarted the speed improves again even though the amount of data is the same.

I am not that experienced in profiling but from what I can tell my end of the code appears to be quite fast and its whatever bokeh is doing on the backend that appears to take most of the time. Are there any good tools that I can use to profile a bokeh app? Unfortunately I cant share the code but if I can isolate the problem I can try and provide a test case that reproduces the problem.

Also I tried upgrading to 0.12 and this seems to have accentuated the issue.

Thank

John


You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/5767c0f1-cc88-4139-87d3-294ba269797c%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

John,

Did you happen to see my other reply with information about kernprof?

Thanks,

Bryan

···

On Jun 30, 2016, at 3:45 PM, John.B <[email protected]> wrote:

Hi Bryan,

memory consumption definitely doesn't seem to be an issue with the server and client only consuming ~200MB each.
As far as I can tell the app seems to lock up over time so responsiveness to user interactions are slow eg changing plot tools can take a while. The execution of my callback though is quiet fast but again pushing this back to the client takes ages. This is all run on a local machine running windows 7.
I've also noticed that periodic callbacks seem reasonably fast but I cant really tell without being able to profile it.

Is there anyway to determine what is taking up most of the time? if not ill try and make a simple version that reproduces the problem.

Thanks
John
On Thursday, June 30, 2016 at 11:56:15 PM UTC+10, Bryan Van de ven wrote:
Hi John,

There are a couple of features in the pipeline that may help with apps like you describe:

* protocol messages and features for noting "busy" (e.g. a spinner, etc.)
* a binary array protocol for more efficient transfer of array data

But I suspect that you might simply be bumping up against (or pushing past) the current limits of the server. It's possible that there is a resource leak (in either the server, OR the client --- you might check memory consumption over time for both) and that the scale of your app is uncovering it quickly.

Can you can share example code to reproduce the issue (perhaps with synthesized or fake data)? It is possible that a review of the code will suggest some change in usage or optimization that will help immediatey, but more likely it would simply be a great help for investigating any server problems that need addressing.

Thanks,

Bryan

> On Jun 30, 2016, at 6:41 AM, John.B <[email protected]> wrote:
>
> Hi,
>
> I have a bokeh app that is running very slow. The app consists of around several plots (20-50k pts in total), a couple of tables (5k rows x 10 columns), some charts and some widgets to help interact with the widgets. Currently what I experience is that on the start up of the app it is quite responsive, but the longer the server has been running the slower it gets taking up to several mins for a user interaction to run its callback and the corresponding output to be generated in the app. Curiously if the app is restarted the speed improves again even though the amount of data is the same.
>
> I am not that experienced in profiling but from what I can tell my end of the code appears to be quite fast and its whatever bokeh is doing on the backend that appears to take most of the time. Are there any good tools that I can use to profile a bokeh app? Unfortunately I cant share the code but if I can isolate the problem I can try and provide a test case that reproduces the problem.
>
> Also I tried upgrading to 0.12 and this seems to have accentuated the issue.
>
> Thank
> John
>
> --
> You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
> To post to this group, send email to [email protected].
> To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/5767c0f1-cc88-4139-87d3-294ba269797c%40continuum.io.
> For more options, visit https://groups.google.com/a/continuum.io/d/optout.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/43714f70-74d2-4e9c-9eea-04b968332f4f%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Hi Bryan,

I tried it out but it doesnt seem to work seem to be hitting this issue: https://github.com/rkern/line_profiler/issues/24

Has anyone had any success with line_profiler?

And I can’t seem to reproduce the problem if I try and simplify the app.

When you say that I “might simply be bumping up against (or pushing past) the current limits of the server” is this due to the amount of data cause it doesn’t seem like that much to me? Given that the app is slowing down the longer it runs is there anyway to reset the document to speed it back up?

thanks again,

John

Hi Bryan,

I tried it out but it doesnt seem to work seem to be hitting this issue: https://github.com/rkern/line_profiler/issues/24
Has anyone had any success with line_profiler?

That profiler was suggested to me by someone else, I am afraid I have no personal experience with it.

And I can't seem to reproduce the problem if I try and simplify the app.

By simplify you mean roughly the same amount of plots / data / update rate, but with some fake generated data?

When you say that I "might simply be bumping up against (or pushing past) the current limits of the server" is this due to the amount of data cause it doesn't seem like that much to me? Given that the app is slowing down the longer it runs is there anyway to reset the document to speed it back up?

Amount of data is always relative to something. Drawing 50k points on HTML canvas will be somewhat slow. If you are updating faster than the canvas can draw, that will be a problem. There's no mechanism to handle or report back-pressure right now. It's possible trying your app with WebGL enabled might help things? Beyond that, the current serialization strategy is "JSON all the things" which is fine as far as it goes, but certainly has lots of head room for improvement, which is why the binary array protocol is an important near-term feature. I can't tell from your description what the data update rates are, so I am just mentioning this as a general comment.

Another thing you could easily try is to see how long things are taking in the browser. If you set the environment variable BOKEH_LOG_LEVEL=debug before you run your app, then the JS console in the browser will be alot more chatty about rendering, etc. (There's also a "trace" level but be prepared for volumes of output). It might be instructive to see if there are differences across browsers. I think this points to needing to add more instrumentation to the server itself to report callback time and network traffic stats as well.

It's also possible there is an inefficiency or resource leak. I actually think from your description that this may be the case. The goal of the server was to be able to turn dead-simple scripts into apps, so there is alot of auto-magic scaffolding around automatically detecting model property changes and computing and sending incremental document diffs to the client. It seems that the document might be accumulating lots of cruft, causing progressively worsening performance. This is an area that needs some through investigation. But also why code is very much important. It's possible that that this is something that different usage could immediately fix, or maybe it's a real bug. I can't speculate with out details of what's going on, though.

Maybe you can compare what you are doing against this code?

    import numpy as np

    from bokeh.io import curdoc
    from bokeh.layouts import column, gridplot
    from bokeh.models import ColumnDataSource
    from bokeh.models.widgets import Button
    from bokeh.plotting import figure

    plots =
    sources =
    for i in range(9):
        source = ColumnDataSource(data=dict(x=, y=))
        plot = figure(plot_width=300, plot_height=300) #, webgl=True) # webgl flaky on safari
        plot.circle(x='x', y='y', source=source)
        plots.append(plot)
        sources.append(source)

    def update():
        for source in sources:
            source.data = dict(x=np.random.random(4000), y = np.random.random(4000))

    button = Button(label="update")
    button.on_click(update)

    plots = gridplot(plots, ncols=3)

    update()

    curdoc().add_root(column(button, plots))

This create 9 plots, each with four thousand (x,y) points at random. I definitely found webgl to be faster, so maybe that is an option for you (but note: webgl is broken on Safari, the dev that maintains webgl support lives abroad but will finally be able to get them a real macbook to test and fix on at SciPy next week). But even without webgl, the button caused the plot to update in ~1 second on my laptop, even after many dozens of button presses. Perhaps a comparison will suggest a different usage pattern that might help out in your case.

Thanks,

Bryan

···

On Jul 4, 2016, at 6:06 AM, John.B <[email protected]> wrote:

thanks again,
John

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/f94cdce5-962b-49ba-9ffb-d39c7a98564a%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

As a concrete example of why details matter: I tried changing from .circle to .line in the code I sent before. This works fine with WebGL, but it was basically unusable without WebGL (~10 seconds to load or update). The reason for this is because a line with 4000 randomly crossed segments is a huge burden for HTML canvas rasterizers. So the glyph type matters. But I expect it would behave much better with more "timeseries"-like data as opposed to random data. So that data matters, too. There's basically nothing we can do to control the performance characteristics of browser canvas implementations, though.

If it turned out the be the case that browser canvas performance was an issue here, another option to consider is DataShader:

  https://github.com/bokeh/datashader

That takes the rasterization out of the browser and reduces bandwidth as well by sending only fixed size images to Bokeh. It might help in your case as well.

Bryan

···

On Jul 4, 2016, at 11:01 AM, Bryan Van de Ven <[email protected]> wrote:

On Jul 4, 2016, at 6:06 AM, John.B <[email protected]> wrote:

Hi Bryan,

I tried it out but it doesnt seem to work seem to be hitting this issue: https://github.com/rkern/line_profiler/issues/24
Has anyone had any success with line_profiler?

That profiler was suggested to me by someone else, I am afraid I have no personal experience with it.

And I can't seem to reproduce the problem if I try and simplify the app.

By simplify you mean roughly the same amount of plots / data / update rate, but with some fake generated data?

When you say that I "might simply be bumping up against (or pushing past) the current limits of the server" is this due to the amount of data cause it doesn't seem like that much to me? Given that the app is slowing down the longer it runs is there anyway to reset the document to speed it back up?

Amount of data is always relative to something. Drawing 50k points on HTML canvas will be somewhat slow. If you are updating faster than the canvas can draw, that will be a problem. There's no mechanism to handle or report back-pressure right now. It's possible trying your app with WebGL enabled might help things? Beyond that, the current serialization strategy is "JSON all the things" which is fine as far as it goes, but certainly has lots of head room for improvement, which is why the binary array protocol is an important near-term feature. I can't tell from your description what the data update rates are, so I am just mentioning this as a general comment.

Another thing you could easily try is to see how long things are taking in the browser. If you set the environment variable BOKEH_LOG_LEVEL=debug before you run your app, then the JS console in the browser will be alot more chatty about rendering, etc. (There's also a "trace" level but be prepared for volumes of output). It might be instructive to see if there are differences across browsers. I think this points to needing to add more instrumentation to the server itself to report callback time and network traffic stats as well.

It's also possible there is an inefficiency or resource leak. I actually think from your description that this may be the case. The goal of the server was to be able to turn dead-simple scripts into apps, so there is alot of auto-magic scaffolding around automatically detecting model property changes and computing and sending incremental document diffs to the client. It seems that the document might be accumulating lots of cruft, causing progressively worsening performance. This is an area that needs some through investigation. But also why code is very much important. It's possible that that this is something that different usage could immediately fix, or maybe it's a real bug. I can't speculate with out details of what's going on, though.

Maybe you can compare what you are doing against this code?

   import numpy as np

   from bokeh.io import curdoc
   from bokeh.layouts import column, gridplot
   from bokeh.models import ColumnDataSource
   from bokeh.models.widgets import Button
   from bokeh.plotting import figure

   plots =
   sources =
   for i in range(9):
       source = ColumnDataSource(data=dict(x=, y=))
       plot = figure(plot_width=300, plot_height=300) #, webgl=True) # webgl flaky on safari
       plot.circle(x='x', y='y', source=source)
       plots.append(plot)
       sources.append(source)

   def update():
       for source in sources:
           source.data = dict(x=np.random.random(4000), y = np.random.random(4000))

   button = Button(label="update")
   button.on_click(update)

   plots = gridplot(plots, ncols=3)

   update()

   curdoc().add_root(column(button, plots))

This create 9 plots, each with four thousand (x,y) points at random. I definitely found webgl to be faster, so maybe that is an option for you (but note: webgl is broken on Safari, the dev that maintains webgl support lives abroad but will finally be able to get them a real macbook to test and fix on at SciPy next week). But even without webgl, the button caused the plot to update in ~1 second on my laptop, even after many dozens of button presses. Perhaps a comparison will suggest a different usage pattern that might help out in your case.

Thanks,

Bryan

thanks again,
John

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/f94cdce5-962b-49ba-9ffb-d39c7a98564a%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.