[英]Panda-Column as index for numpy array
How can I use a panda row as index for a numpy array? 如何将熊猫行用作numpy数组的索引? Say I have 说我有
>>> grid = arange(10,20)
>>> df = pd.DataFrame([0,1,1,5], columns=['i'])
I would like to do 我想要做
>>> df['j'] = grid[df['i']]
IndexError: unsupported iterator index
What is a short and clean way to actually perform this operation? 实际执行此操作的简短方法是什么?
Update 更新资料
To be precise, I want an additional column that has the values that correspond to the indices that the first column contains: df['j'][0] = grid[df['i'][0]]
in column 0
etc 确切地说,我想要一个附加列,该列的值与第一列包含的索引相对应: df['j'][0] = grid[df['i'][0]]
列0
df['j'][0] = grid[df['i'][0]]
等
expected output: 预期输出:
index i j
0 0 10
1 1 11
2 1 11
3 5 15
Parallel Case: Numpy-to-Numpy 并行案例:从小到大
Just to show where the idea comes from, in standard python / numpy
, if you have 仅显示标准python / numpy
中的想法来自哪里
>>> keys = [0, 1, 1, 5]
>>> grid = arange(10,20)
>>> grid[keys]
Out[30]: array([10, 11, 11, 15])
Which is exactly what I want to do. 这正是我想要做的。 Only that my keys are not stored in a vector, they are stored in a column. 只有我的密钥没有存储在向量中,它们才存储在列中。
This is a numpy bug that surfaced with pandas 0.13.0 / numpy 1.8.0. 这是一个熊猫0.13.0 / numpy 1.8.0出现的numpy错误。
You can do: 你可以做:
In [5]: grid[df['i'].values]
Out[5]: array([0, 1, 1, 5])
In [6]: Series(grid)[df['i']]
Out[6]:
i
0 0
1 1
1 1
5 5
dtype: int64
This matches your output. 这与您的输出匹配。 You can assign an array to a column, as long as the length of the array/list is the same as the frame (otherwise how would you align it?) 您可以将数组分配给列,只要数组/列表的长度与框架相同即可 (否则如何对齐?)
In [14]: grid[keys]
Out[14]: array([10, 11, 11, 15])
In [15]: df['j'] = grid[df['i'].values]
In [17]: df
Out[17]:
i j
0 0 10
1 1 11
2 1 11
3 5 15
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.