FileInput fails on large files

Hello,
I am experiencing the problem with FileInput when trying to load large files (e.g. 600 MB). I’ve already increased websocket-max-message-size, but it did not help.

Details

Manjaro Linux 20.2, Python 3.7.7, bokeh 2.2.3, tornado 6.1

Example

# app.py
from bokeh.io import curdoc
from bokeh.models import FileInput
file_input = FileInput()
curdoc().add_root(file_input)

I run:
bokeh serve --websocket-max-message-size 2000000000 app.py
and see:

2020-11-23 11:52:33,795 Starting Bokeh server version 2.2.3 (running on Tornado 6.1)
2020-11-23 11:52:33,795 Torndado websocket_max_message_size set to 2000000000 bytes (1907.35 MB)
2020-11-23 11:52:33,795 User authentication hooks NOT provided (default user enabled)
2020-11-23 11:52:33,798 Bokeh app running at: http://localhost:5006/app
2020-11-23 11:52:33,798 Starting Bokeh server with process id: 219735

as expected. Then I go to the browser and try to upload file. When a small file is uploaded nothing happens (as expected), but when I try 600 MB then in python console I see:

2020-11-23 11:54:20,813 error handling message
 message: Message 'PATCH-DOC' content: {'events': [{'kind': 'ModelChanged', 'model': {'id': '1001'}, 'attr': 'mime_type'}], 'references': []} 
 error: KeyError('new')
Traceback (most recent call last):
  File "/home/rafal/venvs/upload_bokeh_problem/lib/python3.7/site-packages/bokeh/server/protocol_handler.py", line 90, in handle
    work = await handler(message, connection)
  File "/home/rafal/venvs/upload_bokeh_problem/lib/python3.7/site-packages/bokeh/server/session.py", line 67, in _needs_document_lock_wrapper
    result = func(self, *args, **kwargs)
  File "/home/rafal/venvs/upload_bokeh_problem/lib/python3.7/site-packages/bokeh/server/session.py", line 261, in _handle_patch
    message.apply_to_document(self.document, self)
  File "/home/rafal/venvs/upload_bokeh_problem/lib/python3.7/site-packages/bokeh/protocol/messages/patch_doc.py", line 100, in apply_to_document
    doc._with_self_as_curdoc(lambda: doc.apply_json_patch(self.content, setter))
  File "/home/rafal/venvs/upload_bokeh_problem/lib/python3.7/site-packages/bokeh/document/document.py", line 1169, in _with_self_as_curdoc
    return f()
  File "/home/rafal/venvs/upload_bokeh_problem/lib/python3.7/site-packages/bokeh/protocol/messages/patch_doc.py", line 100, in <lambda>
    doc._with_self_as_curdoc(lambda: doc.apply_json_patch(self.content, setter))
  File "/home/rafal/venvs/upload_bokeh_problem/lib/python3.7/site-packages/bokeh/document/document.py", line 409, in apply_json_patch
    value = event_json['new']
KeyError: 'new'
2020-11-23 11:54:20,815 error handling message
 message: Message 'PATCH-DOC' content: {'events': [{'kind': 'ModelChanged', 'model': {'id': '1001'}, 'attr': 'value'}], 'references': []} 
 error: KeyError('new')
Traceback (most recent call last):
  File "/home/rafal/venvs/upload_bokeh_problem/lib/python3.7/site-packages/bokeh/server/protocol_handler.py", line 90, in handle
    work = await handler(message, connection)
  File "/home/rafal/venvs/upload_bokeh_problem/lib/python3.7/site-packages/bokeh/server/session.py", line 67, in _needs_document_lock_wrapper
    result = func(self, *args, **kwargs)
  File "/home/rafal/venvs/upload_bokeh_problem/lib/python3.7/site-packages/bokeh/server/session.py", line 261, in _handle_patch
    message.apply_to_document(self.document, self)
  File "/home/rafal/venvs/upload_bokeh_problem/lib/python3.7/site-packages/bokeh/protocol/messages/patch_doc.py", line 100, in apply_to_document
    doc._with_self_as_curdoc(lambda: doc.apply_json_patch(self.content, setter))
  File "/home/rafal/venvs/upload_bokeh_problem/lib/python3.7/site-packages/bokeh/document/document.py", line 1169, in _with_self_as_curdoc
    return f()
  File "/home/rafal/venvs/upload_bokeh_problem/lib/python3.7/site-packages/bokeh/protocol/messages/patch_doc.py", line 100, in <lambda>
    doc._with_self_as_curdoc(lambda: doc.apply_json_patch(self.content, setter))
  File "/home/rafal/venvs/upload_bokeh_problem/lib/python3.7/site-packages/bokeh/document/document.py", line 409, in apply_json_patch
    value = event_json['new']
KeyError: 'new'

Could you help me with this problem? I need to upload large files, potentially of size of a couple of GB per file.

Thanks in advance,
Rafał

Unfortunately I don’t think there is any current easy solution. Bokeh server is built on Tornado, and the default Tornado max_buffer_size is 100Mb, which I believe is what you are running in to. Perhaps it’s possible Bokeh could export Tornado’s max_buffer_size as a user-configurable option, but that’s not currently the case (and it’s also not clear that it would be a good solution, or even a solution at all, without testing).

You could embed the Bokeh server as a library and then you should have access to change Tornado-specific configurations like max_buffer_size, in case you want to experiment.

Otherwise, I think handling large file transfers would require some non-trivial work to support. A real solution would involve chunking but that’s changes at the lowest levels of Bokeh which means elaborate testings and validation. Or perhaps there is some third-party widget like dropzone that could be leveraged, but it would have to be sending data in some separate channel besides the Bokeh websocket, and that is new territory. That doesn’t mean it can’t be done, but I would not expect in in the very near term. (There’s not even any issue for it AFAIK and that is the first step for someone to take)

1 Like