Problem ERROR:bokeh.server.protocol_handler:error handling message Message 'PATCH-DOC' (revision 1)

sebastiangonzalezv · October 1, 2019, 2:23am

hi i having problem with my app after upload a file when update a Select values out. my code is:

import pandas as pd
import bokeh
from bokeh.models import ColumnDataSource, Select, Slider
from bokeh.plotting import figure
import base64
from io import BytesIO
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn import cluster
from sklearn.neighbors import kneighbors_graph
from bokeh.palettes import Paired6
from bokeh.server.server import Server


def Cluster(doc):
    def update_variable(attrname, old, new):
        x = data[x_data.value].values
        y = data[y_data.value].values
        algorithm = algorithm_select.value
        n_clusters = int(clusters_slider.value)
        quantile = float(quantile_slider.value)
        n_neighbors = int(neighbors_slider.value)
        x, y, y_pred = update_cluster(x, y, n_clusters, quantile, n_neighbors, algorithm)
        x, y, y_pred = update_cluster(x, y, n_clusters, quantile, n_neighbors, algorithm)
        colors = [spectral[i] for i in y_pred]
        source.data = dict(colors=colors, x=x, y=y)

    def update_dataframe(attrname, old, new):
        csv = base64.b64decode(select_data.value)
        data = pd.read_csv(BytesIO(csv))
        x_data.options = data.columns.to_list()
        y_data.options = data.columns.to_list()
        x_data.value = data.columns[0]
        y_data.value = data.columns[0]
        x = data[x_data.value].values
        y = data[y_data.value].values
        # x_data.update(options=data.columns.to_list(), value=data.columns[0])
        # y_data.update(options=data.columns.to_list(), value=data.columns[1])
        algorithm = algorithm_select.value
        n_clusters = int(clusters_slider.value)
        quantile = float(quantile_slider.value)
        n_neighbors = int(neighbors_slider.value)
        x, y, y_pred = update_cluster(x, y, n_clusters, quantile, n_neighbors, algorithm)
        colors = [spectral[i] for i in y_pred]
        source.data = dict(colors=colors, x=x, y=y)

    def update_cluster(x, y, n_clusters, quantile, n_neighbors, algorithm):
        xx = StandardScaler().fit_transform(np.column_stack((x, y)))
        bandwidth = cluster.estimate_bandwidth(xx, quantile=quantile)
        connectivity = kneighbors_graph(xx, n_neighbors=n_neighbors, include_self=False)
        connectivity = 0.5 * (connectivity + connectivity.T)
        if algorithm == 'MiniBatchKMeans':
            model = cluster.MiniBatchKMeans(n_clusters=n_clusters)
        elif algorithm == 'Birch':
            model = cluster.Birch(n_clusters=n_clusters)
        elif algorithm == 'DBSCAN':
            model = cluster.DBSCAN(eps=.2)
        elif algorithm == 'AffinityPropagation':
            model = cluster.AffinityPropagation(damping=.9, preference=-200)
        elif algorithm == 'MeanShift':
            model = cluster.MeanShift(bandwidth=bandwidth, bin_seeding=True)
        elif algorithm == 'SpectralClustering':
            model = cluster.SpectralClustering(n_clusters=n_clusters, eigen_solver='arpack',
                                               affinity="nearest_neighbors")
        elif algorithm == 'Ward':
            model = cluster.AgglomerativeClustering(n_clusters=n_clusters, linkage='ward', connectivity=connectivity)
        elif algorithm == 'AgglomerativeClustering':
            model = cluster.AgglomerativeClustering(linkage="average", affinity="cityblock", n_clusters=n_clusters,
                                                    connectivity=connectivity)
        model.fit(xx)
        if hasattr(model, 'labels_'):
            y_pred = model.labels_.astype(np.int)
        else:
            y_pred = model.predict(xx)
        return xx[:, 0], xx[:, 1], y_pred

    data = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
    clustering_algorithms = ['MiniBatchKMeans', 'AffinityPropagation', 'MeanShift', 'SpectralClustering',
                             'Ward', 'AgglomerativeClustering', 'DBSCAN', 'Birch']
    algorithm_select = Select(value=clustering_algorithms[0], title='Select algorithm:', width=200,
                              options=clustering_algorithms)
    x_data = Select(value=data.columns[0], title='x:', width=200, options=data.columns.to_list())
    y_data = Select(value=data.columns[1], title='y:', width=200, options=data.columns.to_list())
    clusters_slider = Slider(title="Number of clusters", callback_policy="mouseup", value=2.0, start=2.0,
                             end=10.0, step=1, width=200)
    quantile_slider = Slider(title="quantile", callback_policy="mouseup", value=0.3, start=0.1,
                             end=1, step=0.1, width=200)
    neighbors_slider = Slider(title="Number of neighbors", callback_policy="mouseup", value=10.0, start=5.0,
                              end=15.0, step=1, width=200)

    x = data[x_data.value].values
    y = data[y_data.value].values
    algorithm = algorithm_select.value
    quantile = float(quantile_slider.value)
    n_neighbors = int(neighbors_slider.value)
    n_clusters = int(clusters_slider.value)
    spectral = np.hstack([Paired6] * 20)
    x, y, y_pred = update_cluster(x, y, n_clusters, quantile, n_neighbors, algorithm)
    colors = [spectral[i] for i in y_pred]
    plot = figure(plot_width=400, plot_height=400, title="my sine wave")
    source = ColumnDataSource(data=dict(x=x, y=y, colors=colors))
    plot.circle('x', 'y', fill_color='colors', line_color=None, source=source)
    select_data = bokeh.models.widgets.inputs.FileInput(accept='.csv')
    select_data.on_change('value', update_dataframe)
    algorithm_select.on_change('value', update_variable)
    x_data.on_change('value', update_variable)
    y_data.on_change('value', update_variable)
    clusters_slider.on_change('value_throttled', update_variable)
    neighbors_slider.on_change('value_throttled', update_variable)
    quantile_slider.on_change('value_throttled', update_variable)
    inputs = bokeh.layouts.column(select_data, algorithm_select, clusters_slider, neighbors_slider, quantile_slider,
                                  x_data, y_data)
    # inputs = bokeh.layouts.column(select_data)
    graphic = bokeh.layouts.row(inputs, plot)
    doc.add_root(graphic)


if __name__ == '__main__':
    print('Opening Bokeh application on http://localhost:5006/')
    server = Server({'/': Cluster}, num_procs=1)
    server.start()
    server.io_loop.add_callback(server.show, "/")
    server.io_loop.start()

i am current using this env

bokeh==1.3.4
Jinja2==2.10.1
joblib==0.13.2
MarkupSafe==1.1.1
numpy==1.17.2
packaging==19.2
pandas==0.25.1
Pillow==6.1.0
pyparsing==2.4.2
python-dateutil==2.8.0
pytz==2019.2
PyYAML==5.1.2
scikit-learn==0.21.3
scipy==1.3.1
six==1.12.0
tornado==6.0.3

complete error:
ERROR:bokeh.server.protocol_handler:error handling message Message ‘PATCH-DOC’ (revision 1) content: {‘events’: [{‘kind’: ‘ModelChanged’, ‘model’: {‘type’: ‘Select’, ‘id’: ‘1002’}, ‘attr’: ‘value’, ‘new’: ‘sepal.width’}], ‘references’: }: KeyError(‘sepal.width’)

thanks for your time

Bryan · October 1, 2019, 2:39am

That error is saying that the model-changed event for a Select value is failing, in this case because a KeyError was raised in the callback that the change triggered. So somewhere in your code you are trying to access a key "sepal.width" in a dictionary (or maybe a DataFrame) where that key does not exist. I usually solve these kinds of issues with old-fashioned print statement debugging.

sebastiangonzalezv · October 1, 2019, 2:51am

it is the name of a column of a data frame the csv i am uploading is iris.csv · GitHub

Bryan · October 1, 2019, 3:38am

I certainly don’t doubt that it is a column in that CSV file, but somewhere along the way it isn’t a key in whatever you end up trying to index into inside your callback. Python (not me) is telling you this with 100% certainty:

KeyError(‘sepal.width’)

This is a logic error in your code. As I mentioned, the simplest thing to do is to start sprinkling print statements in the offending callback, to see where/when the program state differs from your expectations.

One other thing you can do is to raise the logging level to "debug". This can be done e.g. by setting the environment variable BOKEH_PY_LOG_LEVEL=debug. That will cause the entire stack trace to be printed. (In the next version of Bokeh, the entire stack trace will always be printed, regardless of log level)

You can also try to use the standard python debugger, e.g. import pdb; pdb.set_trace() on a line in the callback, to step through your code and examine the program state that way.

Bryan · October 1, 2019, 3:48am

A few more comments:

This is a pretty neat example, it would need some cleaning up to make it more pedagogically clear and useful, but it could be a neat contribution if you are interested.
I can’t reproduce any error at all when I run this, no matter what I do. So you will need to be more explicit about exactly what steps are required to reproduce the error you are seeing.