[英]parallel coordinates plot for continous data in pandas
The parallel_coordinates function from pandas is very useful: pandas的parallel_coordinates函数非常有用:
import pandas
import matplotlib.pyplot as plt
from pandas.tools.plotting import parallel_coordinates
sampdata = read_csv('/usr/local/lib/python3.3/dist-packages/pandas/tests/data/iris.csv')
parallel_coordinates(sampdata, 'Name')
But when you have continous data, its behavior is not what you would expect: 但是当你有连续的数据时,它的行为并不是你所期望的:
mypos = np.random.randint(10, size=(100, 2))
mydata = DataFrame(mypos, columns=['x', 'y'])
myres = np.random.rand(100, 1)
mydata['res'] = myres
parallel_coordinates(mydata, 'res')
I would like to have the color of the lines to reflect the magnitude of the continuous variable, eg in a gradient from white to black, preferably also with the possibility of some transparency (alpha value), and with a color bar beside. 我希望线条的颜色能够反映连续变量的大小,例如从白色到黑色的渐变,最好还有一些透明度(alpha值)的可能性,旁边还有一个颜色条。
I had the exact same problem today. 我今天遇到了同样的问题。 My solution was to copy the parallel_coordinates from pandas and to adapt it for my special needs. 我的解决方案是从pandas复制parallel_coordinates并根据我的特殊需要进行调整。 As I think it can be useful for others, here is my implementation: 我认为它对其他人有用,这是我的实现:
def parallel_coordinates(frame, class_column, cols=None, ax=None, color=None,
use_columns=False, xticks=None, colormap=None,
**kwds):
import matplotlib.pyplot as plt
import matplotlib as mpl
n = len(frame)
class_col = frame[class_column]
class_min = np.amin(class_col)
class_max = np.amax(class_col)
if cols is None:
df = frame.drop(class_column, axis=1)
else:
df = frame[cols]
used_legends = set([])
ncols = len(df.columns)
# determine values to use for xticks
if use_columns is True:
if not np.all(np.isreal(list(df.columns))):
raise ValueError('Columns must be numeric to be used as xticks')
x = df.columns
elif xticks is not None:
if not np.all(np.isreal(xticks)):
raise ValueError('xticks specified must be numeric')
elif len(xticks) != ncols:
raise ValueError('Length of xticks must match number of columns')
x = xticks
else:
x = range(ncols)
fig = plt.figure()
ax = plt.gca()
Colorm = plt.get_cmap(colormap)
for i in range(n):
y = df.iloc[i].values
kls = class_col.iat[i]
ax.plot(x, y, color=Colorm((kls - class_min)/(class_max-class_min)), **kwds)
for i in x:
ax.axvline(i, linewidth=1, color='black')
ax.set_xticks(x)
ax.set_xticklabels(df.columns)
ax.set_xlim(x[0], x[-1])
ax.legend(loc='upper right')
ax.grid()
bounds = np.linspace(class_min,class_max,10)
cax,_ = mpl.colorbar.make_axes(ax)
cb = mpl.colorbar.ColorbarBase(cax, cmap=Colorm, spacing='proportional', ticks=bounds, boundaries=bounds, format='%.2f')
return fig
I don't know if it will works with every option that pandas original function provides. 我不知道它是否适用于pandas原始功能提供的每个选项。 But for your example, it gives something like this: 但是对于你的例子,它给出了这样的东西:
parallel_coordinates(mydata, 'res', colormap="binary")
You can add alpha value by changing this line in the previous function: 您可以通过在上一个函数中更改此行来添加alpha值:
ax.plot(x, y, color=Colorm((kls - class_min)/(class_max-class_min)), alpha=(kls - class_min)/(class_max-class_min), **kwds)
And for pandas original example, removing names and using the last column as values: 对于pandas原始示例,删除名称并将最后一列用作值:
sampdata = read_csv('iris_modified.csv')
parallel_coordinates(sampdata, 'Value')
I hope this will help you! 我希望这能帮到您!
Christophe 克里斯托夫
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.