Varea_stack streaming different stackers/different number of stackers

crlot · June 9, 2022, 7:56pm

Hi all,

Is it possible to stream into a varea_stack with a new CDS that may contain more/less/only some of the same stackers?

Essentially I’m stacking the rate of traffic from particular IP addresses per-minute but if traffic from that IP address disappears, or a new IP address appears, then the updated source would be different in column length to the original and at present it seems as if this would give me the error: ValueError: Must stream updates to all existing columns on Bokeh 2.4.3.

A minimal example of an initial data set being plotted is as follows:

import pandas as pd
from bokeh.plotting import figure, output_file, show
from bokeh.models import ColumnDataSource
from bokeh.palettes import viridis

data = {
  "datetime": ["2022-06-08 12:00:00", "2022-06-08 12:01:00", "2022-06-08 12:02:00","2022-06-08 12:00:00", "2022-06-08 12:01:00", "2022-06-08 12:02:00"],
  "BITS": [100, 0, 100, 200,200,200],
  "SRC_IP": ["192.168.1.1","192.168.1.1","192.168.1.1","192.168.2.1","192.168.2.1","192.168.2.1"]
}


finalDf = pd.DataFrame.from_dict(data)
print(finalDf)
finalDf = finalDf.pivot(index="datetime", columns="SRC_IP", values="BITS").fillna(0)

timestamps = finalDf.index.tolist()


src_ips = finalDf.columns.tolist()

color = viridis(len(src_ips))

q = figure(x_range=timestamps)


source = ColumnDataSource(finalDf)

q.varea_stack(src_ips, x="datetime", source=finalDf, color=color, legend_label=src_ips)             

show(q)

I would then ideally want to stream in new data where there is also data from a new IP address:

data = {

  "datetime": ["2022-06-08 12:03:00", "2022-06-08 12:04:00", "2022-06-08 12:03:00", "2022-06-08 12:04:00", "2022-06-08 12:03:00", "2022-06-08 12:04:00"],

  "BITS": [300, 100, 400, 100, 600, 400],

  "SRC_IP": ["192.168.1.1","192.168.1.1","192.168.1.1","192.168.2.1","192.168.2.1","192.168.2.1","192.168.3.1","192.168.3.1"]

}

or stream in data where for instance one or more of the IP addresses stops sending data.

If it’s possible I’ll keep trying but at present I can’t find a suitable simple working example to give me a hint in the right direction.

Kind Regards,
Carl

Bryan · June 10, 2022, 2:16am

@crlot What you are trying is not going to work. CDS streaming only works to stream data to the end of existing CDS columns. And, in order to ensure that all the CDS columns stay the same length (which must always be true), you have to stream to all the all the existing columns.

You are attempting to stream to the columns of the original pandas DataFrame, but those are not CDS columns:

In [2]: source.data
Out[2]:
{'datetime': array(['2022-06-08 12:00:00', '2022-06-08 12:01:00',
        '2022-06-08 12:02:00'], dtype=object),
 '192.168.1.1': array([100,   0, 100]),
 '192.168.2.1': array([200, 200, 200])}

Streaming together with varea_stack is only going to be useful to extend (all) the existing stacked areas further to the right. However, if you need to stack a new area somewhere, or add points in the middle, streaming will not be useful. You should really just recreate the plot from scratch with the new data. Stacking actually creates multiple separate glyphs that have stacking expressions relations set up between them. Unfortunately these relations are set up on the Python side, and not really amenable to editing/updating after the fact.

gmerritt123 · June 10, 2022, 1:02pm

Thanks @Bryan, I was struggling to explain my understanding of how the varea_stack works so hesitated to reply.

I’ll add that I have gone down this rabbit hole for a number of applications and each time have resorted to a Patches glyph driven by a CDS with “actual_x”, “actual_y” and “stack_geom_x” and “stack_geom_y” columns. Then when new data gets added, or existing data gets manipulated, CustomJS code calculates the new stack geometries with some array manipulation and cumulative summing. See Fill between multi_line - #2 by gmerritt123 for some related reference.

The kinda nice thing about this approach is that because you are directly writing out the geometry based on the y values of the thing below the current one, you don’t need consistent X values for each item → the bottom geom is based on the x-y of the previous item’s top, and the top is based on the cumulative sum of the y, and the current x.

crlot · June 10, 2022, 1:18pm

Thanks @Bryan for taking the time to review and explain why this wouldn’t be possible/expand on streaming requirements and thanks @gmerritt123 for pointing me in the direction of your similar solition too, though I fear that perhaps that’s going to take me a while to get my head round but I’ll give it a shot.

I guess @Bryan on the usefulness of the stacked plot perspective, you’re definitely correct in it not providing a use case in terms of continuation however with each stacking time period I’d like to see how data was stacked in the previous period, i.e. historical rate of traffic from given IPs over historical minutes. So whilst recreating from scratch is an option, I had hoped I could simply append the new stack to the existing plot by way of streaming.

Perhaps this would open up the query as to whether it’s possible to callback and plot a new stack to the right of the existing stack, so not streaming as the stacked areas could differ, but appending so the perception to the user is one visualisation that’s only having to load and append a single new data set per-minute (for-instance) as opposed to reloading say an hour’s worth of data every minute?

Thanks!

Bryan · June 10, 2022, 2:06pm

@crlot I don’t really have a clear picture in my head of what you are describing. A few images of similar plots or even sketches would go a long way towards helping my understanding of what you are asking for.

system · September 8, 2022, 2:06pm

This topic was automatically closed 90 days after the last reply. New replies are no longer allowed.