bokeh serve exception handling

Hello,

I am running a bokeh server instance which is managed directly by systemd1. Occasionally, an exception is thrown which leaves the server in a broken state until until it is restarted. This is an undesirable failure mode as it requires external monitoring of either websocket responses or logs in order to detect a fault. Ideally, unhandled app exceptions would cause the server instance to die, which in my case would be immediately restarted by systemd. A definitive failure mode would also be preferable when load-balancing between multiple instances.

I am clueless about the workings of tornado but after a quick skim of bokeh/server/server.py, I suspect that exceptions are being caught by default in the tornado event loop? Is there a reasonably straight forward path to add support for app generation exceptions to be a fatal error? Any pointers would be greatly appreciated.

Cheers,

-Josh

···

Hi,

You should probably open a GH issue to discuss this. In the case of a single long-running app your request might be a reasonable one. But for serving the same app to a large audience, such a change unilaterally would have the effect that any error in any one user's session would destroy any other existing sessions, and prevent any new ones. That's probably not desirable default behavior.

If you are saying that the exception already leaves the server unable to serve new sessions, then that is unexpected, in which case more information is needed (the exact exception being raised, e.g)

Thanks,

Bryan

···

On Mar 6, 2017, at 13:04, [email protected] wrote:

Hello,

I am running a bokeh server instance which is managed directly by systemd[1]. Occasionally, an exception is thrown which leaves the server in a broken state until until it is restarted. This is an undesirable failure mode as it requires external monitoring of either websocket responses or logs in order to detect a fault. Ideally, unhandled app exceptions would cause the server instance to die, which in my case would be immediately restarted by systemd. A definitive failure mode would also be preferable when load-balancing between multiple instances.

I am clueless about the workings of tornado but after a quick skim of `bokeh/server/server.py`, I suspect that exceptions are being caught by default in the tornado event loop? Is there a reasonably straight forward path to add support for app generation exceptions to be a fatal error? Any pointers would be greatly appreciated.

Cheers,

-Josh

--
[1]: https://github.com/lsst-sqre/sandbox-jenkins-demo/blob/master/jenkins_demo/templates/squash/squash-bokeh%40.service.epp

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/42bb354e-fd05-4ee7-91c5-1c0dcbfa8eb1%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Running into exactly this. Did you get anywhere with it?

···

On Monday, 6 March 2017 16:09:04 UTC-3, Joshua Hoblitt wrote:

Hello,

I am running a bokeh server instance which is managed directly by systemd1. Occasionally, an exception is thrown which leaves the server in a broken state until until it is restarted. This is an undesirable failure mode as it requires external monitoring of either websocket responses or logs in order to detect a fault. Ideally, unhandled app exceptions would cause the server instance to die, which in my case would be immediately restarted by systemd. A definitive failure mode would also be preferable when load-balancing between multiple instances.

I am clueless about the workings of tornado but after a quick skim of bokeh/server/server.py, I suspect that exceptions are being caught by default in the tornado event loop? Is there a reasonably straight forward path to add support for app generation exceptions to be a fatal error? Any pointers would be greatly appreciated.

Cheers,

-Josh