How to upload any kind of file to the server side with FileInput?

Johan_Monard · February 1, 2020, 3:40pm

I have been struggling with uploading files to the bokeh server as I did not understand very well how the base64 encoding/decoding works. I only managed to upload a .csv file by decoding and then reconstructing it. This is probably not the simplest way to do and can hardly be repeated for a more complex file structure (how to go form “message” back to a .shp file for instance in the code below), which makes me believe there is a better way to do it and that could upload any kind of file…

Here is what I did for the .csv file:

 file_input = FileInput(accept=".csv")

def upload_csv_to_server(attr, old, new):
 
     #decode base64 format (from python base24 website)
     base64_message = file_input.value
     base64_bytes = base64_message.encode('ascii')
     message_bytes = base64.b64decode(base64_bytes)
     message = message_bytes.decode('ascii')
 
     #convert string to csv and save it on the server side
     message_list = message.splitlines()
     with open('data/' + file_input.filename, 'w', newline='') as file:
         writer = csv.writer(file)
         for i in range(len(message_list)):
             writer.writerow(message_list[i].split(','))
 
 file_input.on_change('value', upload_csv_to_server)

Thanks!

Bryan · February 1, 2020, 7:07pm

@Johan_Monard FYI you had text-quoted the code above, not applied code formatting. I have edited the post to fix it up, but for the future, code formatting uses triple-backtick ``` fences around code blocks.

As for the question, we are limited in what we can do. The FileInput transmits the base64 encoded contents back to the Python process, and that’s it. It’s up to you to decode that string and interpret it according to your knowledge of the file type, that’s not something we can do automatically. (We can’t reliably know the file type, and even if we did, there’s not always only one appropriate course of action)

We could potentially consider adding some separate convenience functions that handle some of the commoner cases, that people could choose to use, as appropriate. Please feel free to open a GitHub issue to discuss (especially if you are interested in contributing/collaborating on the implementation).

Johan_Monard · February 11, 2020, 11:17am

Hi Bryan, I got confused by your answer as I am new to these files manipulations, I am not used to GitHub issues but will definitely give it a try. It would be good to have a function that let the user chose any file, and put it in a target folder on the server side as a ftp would do without caring about the file extension.

Before closing this conversation, here is a much simple version of my “upload_csv_to_server” function above:

import pandas as pd
def upload_csv_to_server(attr, old, new):
    decoded = b64decode(new)
    file = io.BytesIO(decoded)
    source_df = pd.read_csv(file)
    source_df.to_csv('path\my_file.csv')

I just send the reconstructed ‘file’ variable to a function which knows how to handle .csv file and that’sit, I can manipulate and save.

So I thought ‘dude, let’s do the same for all types of files I need to upload’, why not shape files:

import geopandas as gpd
def upload_shp_to_server(attr, old, new):
    decoded = b64decode(new)
    file = io.BytesIO(decoded)
    gpd_df = gpd.read_file(file) #Here it fails :-(
    ...

Or a simple a pickle file:

import pandas as pd
def upload_pkl_to_server(attr, old, new):
    decoded = b64decode(new)
    file = io.BytesIO(decoded)
    source_df = pd.read_pickle(file) #Here it fails again :-( (ValueError: Unrecognized compression type: infer)
    ...

The last 2 examples failed miserably, I feel I am not far and it might come from the .BytesIO but I am lost at this point… any idea?

Bryan · February 11, 2020, 3:36pm

@Johan_Monard I don’t know about the shapefile case but the pickle case is a limitation of Pandas itself:

https://github.com/pandas-dev/pandas/issues/26237

There is nothing we can do about that. Offhand I’d suggest you try actually saving the file instead of using a BytesIO (or perhaps explicitly passing compression=None is possible and would help).

Johan_Monard · February 11, 2020, 6:51pm

Passing compression=None worked for the pkl file:

def upload_pkl_to_server(attr, old, new):
    decoded = b64decode(input_widget.value)
    file = io.BytesIO(decoded)
    df = pd.read_pickle(file, compression=None)
    df.to_pickle('path\my_file.pkl')

But I really need to get the shapefile uploaded, what do you mean by " try saving the file instead of using a BytesIO", how can I do that?

Bryan · February 11, 2020, 7:26pm

@Johan_Monard I mean literally write the bytes to a file on disk first using the standard Python standard library file I/O functions, and then call gpd.read_file on that real actual file that you wrote, instead of on a BytesIO. Depending on your situation, you might need to think about using the Python standard library functions that exist for getting unique temporary filenames, and/or cleaning up the files when you are done with them.

As a gentle FYI if you just say something like “#Here it fails :-(” without stating the actual details of the failure, then it’s not generally going to be possible for anyone to provide any kind of detailed help to you. There’s an infinite number of ways things can fail, that statement does not contain any information to actually work from. As an alternative to my suggestion above, you might also take this question to a GeoPandas support forum, and perhaps they can help make it work with BytesIO. But I can be sure that they will 100% need full details in order to do so.

beder · July 14, 2020, 10:03pm

Hi, was the question which pertained to shapefiles ever resolved?