Unable to plot with bokeh.plotting using categorical axis labeling

I am trying to create a simple scatter plot using omics data read in from a .CSV file. I have samples in rows with peak areas for metabolites in corresponding columns. I would like to be able to plot sample (x axis) vs metabolite. I included a truncated .CSV file as an example.

I started with the following code:

import pandas

from bokeh.plotting import figure, output_file, show

df = pandas.read_csv(“omicsdata.csv”)

p = figure(title=“Chart Title”, x_axis_label=‘Sample’, y_axis_label=‘Peak Area’)

p.scatter(df[‘Sample’], df[‘CMP’], line_width=1)

show§

``

This creates the chart, but with no data points plotted. It seems to be the non-numerical names in the sample column. If I plot another column, say two metabolites against each other, it will plot fine. I would prefer not to re-label samples as these sample names correspond to sample names in a larger database.

Following the guide from http://bokeh.pydata.org/en/latest/docs/user_guide/plotting.html#userguide-plotting , I tried to create a categorical axis label.

import pandas

from bokeh.plotting import figure, output_file, show

df = pandas.read_csv(“omicsdata.csv”)

label = df[‘Sample’]

p = figure(title=“Chart Title”, x_axis_label=‘Sample’, y_axis_label=‘Peak Area’)

p = figure(x_range=label)

p.scatter(df[‘Sample’], df[‘CMP’], line_width=1)

show§

``

Running the above, I get an error about invalid range input

raise ValueError(“Unrecognized range input: ‘%s’” % str(range_input))

ValueError: Unrecognized range input

``

It does not seem to like the non-numerical values in the sample column. So, I tried to explicitly create a new list from the sample column explicitly as string data

import pandas

from bokeh.plotting import figure, output_file, show

df = pandas.read_csv(“omicsdata.csv”)

label = df[‘Sample’]

map(str, label)

p = figure(title=“Chart Title”, x_axis_label=‘Sample’, y_axis_label=‘Peak Area’)

p = figure(x_range=label)

p.scatter(df[‘Sample’], df[‘2-aminoadipic acid’], line_width=1)

show§

``

This gives me the same range error output.

Any thoughts? I am new to python and have not encountered the same issues previously when trying to plot with ggplot in R.

Thanks

omicsdata.csv (6.44 KB)

The x_range passed to figure() needs to contain the unique list of categories. You are passing the entire data column, which happens to have categories as values. But I am guessing has the category values repeated (because it is data, which is just whatever it is).

Hopefully a minimal explicit example will help:

    from bokeh.plotting import figure

    p = figure(x_range=['a', 'b', 'c']) # x_range contains *unique* categories

    p.scatter(x=['a', 'b', 'c', 'a', 'a', 'c'], y = [1, 2, 3, 4, 5, 6]) # x has the data

    output_file("foo.html")

    show(p)

Or maybe another way of putting it: the "list of categories" is a different thing from an "arbitrary data column with categories as values". x_range needs to be the "list of categories".

Thanks,

Bryan

···

On Jul 1, 2016, > at 7:44 AM, [email protected] wrote:

I am trying to create a simple scatter plot using omics data read in from a .CSV file. I have samples in rows with peak areas for metabolites in corresponding columns. I would like to be able to plot sample (x axis) vs metabolite. I included a truncated .CSV file as an example.

I started with the following code:

import pandas
from bokeh.plotting import figure, output_file, show

df = pandas.read_csv("omicsdata.csv")

p = figure(title="Chart Title", x_axis_label='Sample', y_axis_label='Peak Area')

p.scatter(df['Sample'], df['CMP'], line_width=1)

show(p)

This creates the chart, but with no data points plotted. It seems to be the non-numerical names in the sample column. If I plot another column, say two metabolites against each other, it will plot fine. I would prefer not to re-label samples as these sample names correspond to sample names in a larger database.

Following the guide from http://bokeh.pydata.org/en/latest/docs/user_guide/plotting.html#userguide-plotting , I tried to create a categorical axis label.

import pandas
from bokeh.plotting import figure, output_file, show

df = pandas.read_csv("omicsdata.csv")

label = df['Sample']

p = figure(title="Chart Title", x_axis_label='Sample', y_axis_label='Peak Area')
p = figure(x_range=label)

p.scatter(df['Sample'], df['CMP'], line_width=1)

show(p)

Running the above, I get an error about invalid range input

raise ValueError("Unrecognized range input: '%s'" % str(range_input))

ValueError: Unrecognized range input

It does not seem to like the non-numerical values in the sample column. So, I tried to explicitly create a new list from the sample column explicitly as string data

import pandas
from bokeh.plotting import figure, output_file, show

df = pandas.read_csv("omicsdata.csv")

label = df['Sample']
map(str, label)

p = figure(title="Chart Title", x_axis_label='Sample', y_axis_label='Peak Area')
p = figure(x_range=label)

p.scatter(df['Sample'], df['2-aminoadipic acid'], line_width=1)

show(p)

This gives me the same range error output.

Any thoughts? I am new to python and have not encountered the same issues previously when trying to plot with ggplot in R.

Thanks

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/31c5ba97-809f-4117-a375-58aa1afeddd8%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.
<omicsdata.csv>

Hi Bryan,

Thanks for the quick reply.

I suppose my followup question is, can I parse my sample column in my .CSV file into a list of categories suitable to plot against? Essentially I need the sample names from the sample column to pulled from the .CSV and assigned to a new variable as a list? How do I do this? I assumed that’s what I was doing with the code:

label = df[‘Sample’]

``

The goal is to be able to automate this process with code rather than manually convert or relabel. My data sets generally contain 50-500 samples each, making automating this a necessity.

Thanks again for the help

Hi,

    df['Sample']

is all the data, which presumably has category names repeated. One way to unique the set of categories is to pass it to the Python set type:

    set(df['Sample'])

will only have unique values. Being a set, it is unordered. If you'd like your categories to appear in a certain order on the axis, you have to put them in that order when you set x_range, there's no way to guess what users might want for order. If you just want an alphabetic ordering, then:

    x_range = sorted(set(df['Sample']))

should do.

Thanks,

Bryan

···

On Jul 1, 2016, at 9:12 AM, [email protected] wrote:

Hi Bryan,

Thanks for the quick reply.

I suppose my followup question is, can I parse my sample column in my .CSV file into a list of categories suitable to plot against? Essentially I need the sample names from the sample column to pulled from the .CSV and assigned to a new variable as a list? How do I do this? I assumed that's what I was doing with the code:

label = df['Sample']

The goal is to be able to automate this process with code rather than manually convert or relabel. My data sets generally contain 50-500 samples each, making automating this a necessity.

Thanks again for the help

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/b3ee0e33-930f-4e8c-bfc1-ff7164d79ab0%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Thank you.

set() seemed to be the function I needed. I was able to generate my plot as I wanted. Though it wouldn’t accept it without using sort() as well.

I’m working through creating a loop function now to generate plots for all metabolites now.

Thanks again!!

I am trying to simply read a csv file and plot any of its column using bokeh but it gives a strange error, please help

I started with the following code:and it give an error

import pandas as pd

from bokeh.plotting import figure, output_file, show

AAPL = pd.read_csv(

“C:\Users\NITESH\Desktop\table”,

parse_dates=[‘Date’]

)

output_file(“datetime.html”)

create a new plot with a datetime axis type

p = figure(width=800, height=250, x_axis_type=“datetime”)

p.line(AAPL[‘Date’], AAPL[‘Close’], color=‘navy’, alpha=0.5)

show§

``

This error arrives

runfile(‘C:/Users/NITESH/Desktop/untitled0.py’, wdir=‘C:/Users/NITESH/Desktop’)

File “C:/Users/NITESH/Desktop/untitled0.py”, line 12

“C:\Users\NITESH\Desktop\table”,

^

SyntaxError: (unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escape

``

But if i use link of the file then no error arrives.Link is http://ichart.yahoo.com/table.csv?s=AAPL&a=0&b=1&c=2000&d=0&e=1&f=2010"

And this program is given at http://bokeh.pydata.org/en/latest/docs/user_guide/plotting.html

Please reply soon

The error is referring to the backslashes in your file path string. Python interprets backslashes as a special ‘escape’ character for coding characters in various ways (unicode, hex, special characters and such). In this case it thinks you are trying to encode a unicode character because it interprets “\U” as an escape for encoding unicode literals. You simply need to replace your backslashes with forward slashes or use double back slashes (python interprets the first backslash as an escape and the second one as a literal backslash.) Also, you need to specify the full file name in your file path, in this case “table.csv”

Here is some information on python string literals that I think you’ll find helpful.

···

On Sun, Feb 12, 2017 at 10:47 AM, [email protected] wrote:

I am trying to simply read a csv file and plot any of its column using bokeh but it gives a strange error, please help

I started with the following code:and it give an error

import pandas as pd

from bokeh.plotting import figure, output_file, show

AAPL = pd.read_csv(

“C:\Users\NITESH\Desktop\table”,

parse_dates=[‘Date’]

)

output_file(“datetime.html”)

create a new plot with a datetime axis type

p = figure(width=800, height=250, x_axis_type=“datetime”)

p.line(AAPL[‘Date’], AAPL[‘Close’], color=‘navy’, alpha=0.5)

show§

``

This error arrives

runfile(‘C:/Users/NITESH/Desktop/untitled0.py’, wdir=‘C:/Users/NITESH/Desktop’)

File “C:/Users/NITESH/Desktop/untitled0.py”, line 12

“C:\Users\NITESH\Desktop\table”,

^

SyntaxError: (unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escape

``

But if i use link of the file then no error arrives.Link is http://ichart.yahoo.com/table.csv?s=AAPL&a=0&b=1&c=2000&d=0&e=1&f=2010"

And this program is given at http://bokeh.pydata.org/en/latest/docs/user_guide/plotting.html

Please reply soon

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/d67b534e-55bc-4524-b198-db973eb604ae%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Tyler,

I just wanted to say thank you and express my appreciation for your recent answers and help on the mailing list.

Thanks,

Bryan

···

On Feb 12, 2017, at 16:05, Tyler Nickerson <[email protected]> wrote:

The error is referring to the backslashes in your file path string. Python interprets backslashes as a special 'escape' character for coding characters in various ways (unicode, hex, special characters and such). In this case it thinks you are trying to encode a unicode character because it interprets "\U" as an escape for encoding unicode literals. You simply need to replace your backslashes with forward slashes or use double back slashes (python interprets the first backslash as an escape and the second one as a literal backslash.) Also, you need to specify the full file name in your file path, in this case "table.csv"

Here is some information on python string literals that I think you'll find helpful.

On Sun, Feb 12, 2017 at 10:47 AM, <[email protected]> wrote:

I am trying to simply read a csv file and plot any of its column using bokeh but it gives a strange error, please help
I started with the following code:and it give an error

import pandas as pd
from bokeh.plotting import figure, output_file, show

AAPL = pd.read_csv(
        "C:\Users\NITESH\Desktop\table",
        parse_dates=['Date']
    )

output_file("datetime.html")

# create a new plot with a datetime axis type
p = figure(width=800, height=250, x_axis_type="datetime")

p.line(AAPL['Date'], AAPL['Close'], color='navy', alpha=0.5)

show(p)

This error arrives

runfile('C:/Users/NITESH/Desktop/untitled0.py', wdir='C:/Users/NITESH/Desktop')
  File "C:/Users/NITESH/Desktop/untitled0.py", line 12
    "C:\Users\NITESH\Desktop\table",
    ^
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \UXXXXXXXX escape
But if i use link of the file then no error arrives.Link is http://ichart.yahoo.com/table.csv?s=AAPL&a=0&b=1&c=2000&d=0&e=1&f=2010"
And this program is given at http://bokeh.pydata.org/en/latest/docs/user_guide/plotting.html
Please reply soon

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/d67b534e-55bc-4524-b198-db973eb604ae%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

--
You received this message because you are subscribed to the Google Groups "Bokeh Discussion - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].
To post to this group, send email to [email protected].
To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/CAAd1xFSjT86mE3tLX-UAzzLdw6AdRHtFgHP9StW4Ht3EZAHa0Q%40mail.gmail.com.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.

Thanks Bryan,

I’m glad to help when I can. Besides that, it’s fun to see how others are using Bokeh.

···

On Mon, Feb 13, 2017 at 7:23 AM, Bryan Van de ven [email protected] wrote:

Tyler,

I just wanted to say thank you and express my appreciation for your recent answers and help on the mailing list.

Thanks,

Bryan

On Feb 12, 2017, at 16:05, Tyler Nickerson [email protected] wrote:

The error is referring to the backslashes in your file path string. Python interprets backslashes as a special ‘escape’ character for coding characters in various ways (unicode, hex, special characters and such). In this case it thinks you are trying to encode a unicode character because it interprets “\U” as an escape for encoding unicode literals. You simply need to replace your backslashes with forward slashes or use double back slashes (python interprets the first backslash as an escape and the second one as a literal backslash.) Also, you need to specify the full file name in your file path, in this case “table.csv”

Here is some information on python string literals that I think you’ll find helpful.

On Sun, Feb 12, 2017 at 10:47 AM, [email protected] wrote:

I am trying to simply read a csv file and plot any of its column using bokeh but it gives a strange error, please help

I started with the following code:and it give an error

import pandas as pd

from bokeh.plotting import figure, output_file, show

AAPL = pd.read_csv(

    "C:\Users\NITESH\Desktop\table",
    parse_dates=['Date']
)

output_file(“datetime.html”)

create a new plot with a datetime axis type

p = figure(width=800, height=250, x_axis_type=“datetime”)

p.line(AAPL[‘Date’], AAPL[‘Close’], color=‘navy’, alpha=0.5)

show§

This error arrives

runfile(‘C:/Users/NITESH/Desktop/untitled0.py’, wdir=‘C:/Users/NITESH/Desktop’)

File “C:/Users/NITESH/Desktop/untitled0.py”, line 12

"C:\Users\NITESH\Desktop\table",
^

SyntaxError: (unicode error) ‘unicodeescape’ codec can’t decode bytes in position 2-3: truncated \UXXXXXXXX escape

But if i use link of the file then no error arrives.Link is http://ichart.yahoo.com/table.csv?s=AAPL&a=0&b=1&c=2000&d=0&e=1&f=2010"

And this program is given at http://bokeh.pydata.org/en/latest/docs/user_guide/plotting.html

Please reply soon

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/d67b534e-55bc-4524-b198-db973eb604ae%40continuum.io.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/CAAd1xFSjT86mE3tLX-UAzzLdw6AdRHtFgHP9StW4Ht3EZAHa0Q%40mail.gmail.com.

For more options, visit https://groups.google.com/a/continuum.io/d/optout.

You received this message because you are subscribed to the Google Groups “Bokeh Discussion - Public” group.

To unsubscribe from this group and stop receiving emails from it, send an email to [email protected].

To post to this group, send email to [email protected].

To view this discussion on the web visit https://groups.google.com/a/continuum.io/d/msgid/bokeh/5E196821-8DBB-438A-9B65-6A34DF94B4C9%40continuum.io.
For more options, visit https://groups.google.com/a/continuum.io/d/optout.