How many files can fileinput widget transfer to a server?

Hi all,

I’m trying to use the fileinput widget to transfer files to a server for further processing. the files are in zipped format and are 2Mb each. I have at times 500+ of them to upload at once.
So far, even when forcing the websocket-max-message-size to be over 3Gb, the transfer doesn’t work. the max # of files I can transfer is still 10, or else the connection shuts down.
Is it possible to fix that?
My code to upload (fileinput is obviously set to multiple=True) looks like this here below, but Bokeh never reaches that when the # of files to transfer is > 10 so…the problem should really be either fileinput or the connection to the server:
for fn, f in zip(data_upload_scope_shots.filename, data_upload_scope_shots.value):
decoded = base64.b64decode(f)
file_content = io.BytesIO(decoded)
file_out = open(full_path + ‘/scope_data/’ + fn, ‘wb’)
file_out.write(file_content.read())
file_out.close()

any help appreciated

Fabio

Are there any messages in the browser conole? Is the problem total size or total number of files (i.e. can you upload 30 very small files)?

Hi Bryan,

no clue: the only message I see is websocketconnection: closed, reason none…at least that’s what I see in the log on screen

And for the problem, even worse: the websocket-max-message-size is set to accept 4 Gb.
Originally I set it to 2 Gb and at that point I could transfer only 5 (same 2Mb files)
then I set it to 4Gb (at least that’s what the server says: 3900Mb) and now it seems only 10 can be transferred.
like I said, the total # is just around 570 in my current case and they’re all the same size:2.05Mb, so…it should work.

I can try to push the limit up to 40Gb?

or the websocket size is measured in bits instead? (In which case, I apologize for the time wasted!)

Fabio

@drstein71 Bokeh does not directly do anything with --websocket-max-message-size. The value is simply passed on to the Tornado WebocketHandler . It’s possible there are downstream limitations in Tornado, and if that is the case then we can’t really do anything about it, unfortunately.

However I’m still trying to assess whether the problem is due to the total overall size, or whether is is due to the number of files. Can you try what I suggested above to see whether more than 10 small files can be sent successfully (by “small” I mean “a few kilobytes”).

Just for the sake of transparency I’ll also state that the usage you are attempting is pretty far afield of the original motivating use-case for this feature, which was “I want to upload one (not-huge) CSV file to process in a callback”. It may simply be best to look at options outside Bokeh for file transfer parts.

Hi Bryan,

sorry for the delay! I was able to run some tests. Here’s what I found:

  • if I try to upload small files (text or binary, it seems to be ok anyway) in the range of few Kb then I don’t have any problems uploading (at least up to 50+)
  • If I try to upload a mix of zip files / tar files and pdf files and txt files (for the sake of it) and the file size of some of them is in the order of few Mb (10) I still have no issues
  • If I try to upload ~20 zipped files (2Mb each) it still works (it seems to be taking quite some time in the for loop to go through them)
  • when I tried to upload 46 of them (same zipped files, same size) I got this:

2020-11-02 00:17:38,837 Starting Bokeh server version 2.2.3 (running on Tornado 6.0.3)

2020-11-02 00:17:38,840 Torndado websocket_max_message_size set to 4000000000 bytes (3814.70 MB)

2020-11-02 00:17:38,840 User authentication hooks NOT provided (default user enabled)

2020-11-02 00:17:38,843 Bokeh app running at: http://localhost:5006/DataAnalysis

2020-11-02 00:17:38,843 Starting Bokeh server with process id: 24650

2020-11-02 00:40:06,636 Reached maximum read buffer size

2020-11-02 00:40:06,636 error on read: Reached maximum read buffer size

2020-11-02 00:40:06,637 WebSocket connection closed: code=None, reason=None

so…I’m not sure how to read it: is it tornado issuing the “Reached maximum read buffer” message?

Thanks for looking into it!

Fabio

@drstein71 That message is coming from tornado:

https://github.com/tornadoweb/tornado/blob/b3c63fbce0e97fd0428199ffdeddbcb237ef03e9/tornado/iostream.py#L894

So there is not any immediate solution or workaround I can offer. It’s possible the max_buffer_size is something we can make configurable on the Bokeh side. You could certainly probably tinker with that value if you embed the Bokeh server programmatically instead of using the bokeh serve command like program. But there’s every possibility that raising that value very high to accommodate large file transfers would have adverse effects in other ways.

The main issue is that for something like this, the file data should be sent chunked, but is currently going out as one giant websocket message, which is not efficient for a number of reasons. One idea would be to try to implement automatic chunking directly in Bokeh, but this would be a big change at the lowest levels, requiring extensive development and testing. I am doubtful the cost/benefit is there to undertake that kind of work.

Or maybe there is some way to at least break up things per-file at a higher level, where each file triggers a new message and event. But it’s unclear how that fits in with Bokeh’s current foundational “setting a property triggers one event” mode of operation.

Another idea might be a version of the file selector that did not sync the actual data to Python, instead only triggering a JS callback. Then the JS callback could POST the data to some API to save it, etc. If you only need to store off the files, and not access them in the Bokeh app code, that would provide an avenue.

In the immediate term, though, I do not think Bokeh file input will be able to satisfy your needs with this use case.

cc @Philipp_Rudiger in case you have thoughts about this.

Hi Bryan,

thanks for the detailed reply! I feared it was not as easy as I thought it would be. I understand it’ won’t be an easy task to take on your side as well.

Can I ask another little question though: how would I embed a pure javascript code like dropzone.js inside bokeh? I read some documents on the bokeh.org website, but I’m not that javascript savvy and although I seem to be able to create the proper class so that bokeh doesn’t complain when it read it…the widget never appears on the screen: I think my “implementation” code is wrong, but I can’t find any example on how to display a 3rd party widget in bokeh.

If this is not the right thread I’ll report it in the support area!

Thanks again for your help

Fabio