[英]Convert numpy array with indices to a pandas dataframe
I have a numpy array which I want to print with python ggplot's tile . 我有一个要用python ggplot的tile打印的numpy数组。 For that I need to have a DataFrame with the columns x, y, value.
为此,我需要有一个带有x,y,value列的DataFrame。 How can I transform the numpy array efficiently into such a DataFrame.
我如何才能有效地将numpy数组转换为这样的DataFrame。 Please consider, that the form of the data I want is in a sparse style, but I want a regular DataFrame.
请考虑一下,我想要的数据形式是稀疏样式,但是我想要一个常规的DataFrame。 I tried using scipy sparse data structures like in Convert sparse matrix (csc_matrix) to pandas dataframe , but conversions were too slow and memory hungry: My memory was used up.
我尝试使用像将稀疏矩阵(csc_matrix)转换为pandas dataframe那样的稀疏数据结构,但转换速度太慢且内存不足 :我的内存用完了。
To clarify what I want: 为了澄清我想要什么:
I start out with a numpy array like 我从一个像
array([[ 1, 3, 7],
[ 4, 9, 8]])
and I would like to end up with the DataFrame 我想以DataFrame结尾
x y value
0 0 0 1
1 0 1 3
2 0 2 7
3 1 0 4
4 1 1 9
5 1 2 8
arr = np.array([[1, 3, 7],
[4, 9, 8]])
df = pd.DataFrame(np.hstack((np.indices(arr.shape).reshape(2, arr.size).T,\
arr.reshape(-1, 1))), columns=['x', 'y', 'value'])
print(df)
x y value
0 0 0 1
1 0 1 3
2 0 2 7
3 1 0 4
4 1 1 9
5 1 2 8
You might also consider using the function employed in this answer, as a speedup to np.indices
in the solution above: 您还可以考虑使用此答案中使用的函数,以
np.indices
上述解决方案中的np.indices
:
def indices_merged_arr(arr):
m,n = arr.shape
I,J = np.ogrid[:m,:n]
out = np.empty((m,n,3), dtype=arr.dtype)
out[...,0] = I
out[...,1] = J
out[...,2] = arr
out.shape = (-1,3)
return out
array = np.array([[ 1, 3, 7],
[ 4, 9, 8]])
df = pd.DataFrame(indices_merged_arr(array), columns=['x', 'y', 'value'])
print(df)
x y value
0 0 0 1
1 0 1 3
2 0 2 7
3 1 0 4
4 1 1 9
5 1 2 8
Performance 性能
arr = np.random.randn(1000, 1000)
%timeit df = pd.DataFrame(np.hstack((np.indices(arr.shape).reshape(2, arr.size).T,\
arr.reshape(-1, 1))), columns=['x', 'y', 'value'])
100 loops, best of 3: 15.3 ms per loop
%timeit pd.DataFrame(indices_merged_arr(array), columns=['x', 'y', 'value'])
1000 loops, best of 3: 229 µs per loop
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.