简体   繁体   English

从散点图到2D阵列

[英]From scatter plot to 2D array

My mind has gone completely blank on this one. 我的思绪在这个问题上完全是空白。

I want to do what I think is very simple. 我想做我认为非常简单的事情。

Suppose I have some test data: 假设我有一些测试数据:

import pandas as pd
import numpy as np
k=10
df = pd.DataFrame(np.array([range(k), 
                           [x + 1 for x in range(k)],
                           [x + 4 for x in range(k)], 
                           [x + 9 for x in range(k)]]).T,columns=list('abcd'))

where rows correspond to time and columns to angles, and it looks like this: 行对应时间和列对角度,它看起来像这样:

   a   b   c   d
0  0   1   4   9
1  1   2   5  10
2  2   3   6  11
3  3   4   7  12
4  4   5   8  13
5  5   6   9  14
6  6   7  10  15
7  7   8  11  16
8  8   9  12  17
9  9  10  13  18

Then for reasons I convert it to and ordered dictionary: 然后由于原因我将其转换为有序字典:

def highDimDF2Array(df):
    from collections import OrderedDict # Need to preserve order

    vels = [1.42,1.11,0.81,0.50]

    # Get dataframe shapes
    cols = df.columns

    trajectories = OrderedDict()
    for i,j in enumerate(cols):
        x = df[j].values
        x = x[~np.isnan(x)]

        maxTimeSteps = len(x)
        tmpTraj = np.empty((maxTimeSteps,3))
        # This should be fast
        tmpTraj[:,0] = range(maxTimeSteps) 
        # Remove construction nans
        tmpTraj[:,1] = x
        tmpTraj[:,2].fill(vels[i])

        trajectories[j] = tmpTraj

    return trajectories

Then I plot it all 然后我把它全部绘制出来

import matplotlib.pyplot as plt
m = highDimDF2Array(df)
M = np.vstack(m.values())
plt.scatter(M[:,0],M[:,1],15,M[:,2])
plt.title('Angle $[^\circ]$ vs. Time $[s]$')
plt.colorbar()
plt.show()

在此输入图像描述

Now all I want to do is to put all of that into a 2D numpy array with the properties: 现在,我想做的就是将所有这些放入具有以下属性的2D numpy数组中:

  • Time is mapped to the x-axis (or y doesn't matter) 时间映射到x轴(或y无关紧要)
  • Angle is mapped to the y-axis 角度映射到y轴
  • The entries in the matrix correspond to the values of the coloured dots in the scatter plot 矩阵中的条目对应于散点图中的彩色点的值
  • All other entries are treated as NaNs (ie those that are undefined by a point in the scatter plot) 所有其他条目都被视为NaNs (即散点图中的点未定义的那些)

In 3D the colour would correspond to the height. 在3D中,颜色将对应于高度。

I was thinking of using something like this: 3d Numpy array to 2d but am not quite sure how. 我正在考虑使用这样的东西: 3d Numpy数组到2d,但我不太确定如何。

You can convert the values in M[:,1] and M[:,2] to integers and use them as indices to a 2D numpy array. 您可以将M [:,1]和M [:,2]中的值转换为整数,并将它们用作2D numpy数组的索引。 Here's an example using the value for M you defined. 这是使用您定义的M的值的示例。

out = np.empty((20,10))
out[:] = np.NAN
N = M[:,[0,1]].astype(int)
out[N[:,1], N[:,0]] = M[:,2]
plt.scatter(M[:,0],M[:,1],15,M[:,2])
plt.scatter(M[:,0],M[:,1],15,M[:,2])
plt.title('Angle $[^\circ]$ vs. Time $[s]$')
plt.colorbar()
plt.imshow(out, interpolation='none', origin = 'lower')

在此输入图像描述

Here you can convert M to integers directly but you might have to come up with a function to map the columns of M to integers depending on the resolution of the array you are creating. 在这里,您可以直接将M转换为整数,但您可能需要提供一个函数来将M的列映射到整数,具体取决于您创建的数组的分辨率。

I don't use pandas, so I cannot really follow what your function does. 我不使用熊猫,所以我不能真正遵循你的功能。 But from the description of your array M and what you want, I think the funktion np.histogram2d is what you want. 但是从你的阵列M的描述和你想要的东西,我认为funktion np.histogram2d就是你想要的。 It bins the range of your independent values in equidistant steps and sums all the occurrences. 它以等距步骤对您的独立值范围进行分类,并对所有事件进行求和。 You can apply weighting with your 3rd column to get the proper height. 您可以使用第3列进行加权以获得适当的高度。 You have to choose the number of bins: 您必须选择箱数:

z, x, y   = np.histogram2d(M[:,0], M[:,1], weights=M[:,2], bins=50)
num, x, y = np.histogram2d(M[:,0], M[:,1], bins=50)

z /= num # proper averaging, it also gives you NaN where num==0

plt.pcolor(x, y, z) #visualization

Also plt.hist2d could be interesting plt.hist2d也很有趣

edit: The histogram2d yields the 2D array which was asked for in the question. 编辑: histogram2d产生问题中要求的2D数组。 The visualization, however, should be done with imshow, since pcolor doesn't skip NaN values (is there some way to teach it?) 然而,可视化应该用imshow完成,因为pcolor不会跳过NaN值(有什么方法可以教它吗?)

The advantage of this method is that the x,y values can be float and of arbitrary order. 这种方法的优点是x,y值可以是浮点数和任意顺序。 Further, by defining the number of bins, one can choose the resolution of the resulting image. 此外,通过定义箱的数量,可以选择所得图像的分辨率。 Nevertheless, to get exactly the result which was asked for, one should do: 然而,为了得到所要求的结果,人们应该这样做:

binx = np.arange(M[:,0].min()-0.5, M[:,0].max()+1.5) # edges of the bins. 0.5 is the half width
biny = np.arange(M[:,1].min()-0.5, M[:,1].max()+1.5)

z,   x, y   = np.histogram2d(M[:,0], M[:,1], weights=M[:,2], bins=(binx,biny))
num, x, y   = np.histogram2d(M[:,0], M[:,1], bins=(binx,biny))

z /= num


plt.imshow(z.T, interpolation='none', origin = 'lower')

在此输入图像描述

the output of pcolor doesn't leave out the nans but therefore takes also x and y values into account: pcolor的输出不会遗漏nans,但因此也考虑x和y值:

plt.pcolormesh(x, y, z.T, vmin=0, vmax=2)

在此输入图像描述

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM