简体   繁体   English

使用Datashader绘制NumPy数组数据的最佳方法是什么?

[英]What is the best method for using Datashader to plot data from a NumPy array?

In following the Datashader example notebook demonstrating lines , the input is a Pandas DataFrame (though it seems a Dask DataFrame would work as well). 在跟随Datashader示例笔记本演示 行时 ,输入是一个Pandas DataFrame(虽然看起来Dask DataFrame也可以工作)。 My data is in a NumPy array. 我的数据是在NumPy数组中。 Can I use Datashader to plot lines from NumPy arrays without first putting them into a DataFrame? 我可以使用Datashader绘制NumPy数组中的行而不先将它们放入DataFrame中吗?

The documentation for line glyph seems to indicate this is possible but I did not find an example. 行字形的文档似乎表明这是可能的,但我没有找到一个例子。 The example notebook I linked to uses Canvas.line which I did not find in the documentation. 我链接的示例笔记本使用Canvas.line ,我在文档中找不到。

I did not find a way to plot data in a NumPy array without first putting it into a DataFrame. 我没有找到在NumPy数组中绘制数据的方法,而没有先将其放入DataFrame中。 How to do this was not especially intuitive, it seems Datashader requires the column labels to be non-numeric strings, so they can be called using the df.col_label syntax (rather than the df[col_label] syntax, perhaps there is a good reason for this though). 怎么做不是特别直观,似乎Datashader要求列标签是非数字字符串,所以可以使用df.col_label语法调用它们(而不是df[col_label]语法,也许有一个很好的理由为此虽然)。

With the current system I had to do the following to get the NumPy array into a DataFrame with column labels Datashader would accept. 使用当前系统,我必须执行以下操作,以便将NumPy数组放入具有Datashader可接受的列标签的DataFrame中。

df = pd.DataFrame(data=data.T)
data_cols = ['c{}'.format(c) for c in df.columns]
df.columns = data_cols
df['x'] = x_values

y_range = data.min(), data.max()
x_range = x_values[0], x_values[-1]

canvas = datashader.Canvas(x_range=x_range, y_range=y_range, 
                           plot_height=300, plot_width=900)
aggs = collections.OrderedDict((c, canvas.line(df, 'q', c)) for c in data_cols)

merged = xarray.concat(saxs_aggs.values(), dim=pd.Index(cols, name='cols'))
saxs_img = datashader.transfer_functions.shade(merged.sum(dim='cols'), 
                                               how='eq_hist')

Note that the data_cols variable was important to use, rather than simply df.columns , because it had to exclude the x column (not initially intuitive). 请注意, data_cols变量很重要,而不仅仅是df.columns ,因为它必须排除x列(最初并不直观)。

Here is an example of the resulting with axes added using bokeh. 以下是使用散景添加轴的结果示例。 在此输入图像描述

The OrderedDict and xarray.concat method was incredibly slow when applied to many data curves. 应用于许多数据曲线时, OrderedDictxarray.concat方法非常慢。 The following example demonstrates a much faster method. 以下示例演示了一种更快的方法。 See this GitHub issue for timings and further discussion. 有关时间和进一步讨论,请参阅此GitHub问题

import pandas as pd
import numpy as np
import datashader
import bokeh.plotting
import collections
import xarray
import time
from bokeh.palettes import Colorblind7 as palette

bokeh.plotting.output_notebook()

# create some data worth plotting
nx = 50
x = np.linspace(0, np.pi * 2, nx)
y = np.sin(x)
n = 10000
data = np.empty([n+1, len(y)])
data[0] = x
prng = np.random.RandomState(123)

# scale the data using a random normal distribution
offset = prng.normal(0, 0.1, n).reshape(n, -1)
data[1:] = y
data[1:] += offset

# make some data noisy
n_noisy = prng.randint(0, n,5)
for i in n_noisy:
    data[i+1] += prng.normal(0, 0.5, nx)

dfs = []
split = pd.DataFrame({'x': [np.nan]})
for i in range(len(data)-1):
    x = data[0]
    y = data[i+1]
    df = pd.DataFrame({'x': x, 'y': y})
    dfs.append(df)
    dfs.append(split)

df = pd.concat(dfs, ignore_index=True)   

canvas = datashader.Canvas(x_range=x_range, y_range=y_range, 
                           plot_height=300, plot_width=300)
agg = canvas.line(df, 'x', 'y', datashader.count())
img = datashader.transfer_functions.shade(agg, how='eq_hist')
img

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM