[英]Slice a Pandas dataframe by an array of indices and column names
I'm looking to replicate the behavior of a numpy array with a pandas dataframe. 我想用pandas数据帧复制numpy数组的行为。 I want to pass an array of indices and column names and get a list of objects that are found in the corresponding index and column name.
我想传递一个索引和列名数组,并获取在相应的索引和列名中找到的对象列表。
import pandas as pd
import numpy as np
In numpy: 在numpy:
array=np.array(range(9)).reshape([3,3])
print array
print array[[0,1],[0,1]]
[[0 1 2]
[3 4 5]
[6 7 8]]
[0 4]
In pandas: 在熊猫:
prng = pd.period_range('1/1/2011', '1/1/2013', freq='A')
df=pd.DataFrame(array,index=prng)
print df
0 1 2
2011 0 1 2
2012 3 4 5
2013 6 7 8
df[[2011,2012],[0,1]]
Expected output: 预期产量:
[0 4]
How should I slice this dataframe to get it to return the same as numpy? 我应该如何切割这个数据帧以使其返回与numpy相同的数据?
Pandas doesn't support this directly; 熊猫不直接支持这一点; it could, but the issue is how to specify that you want coordinates rather than different axes, eg
df.iloc[[0,1],[0,1]]
means give me the 0 and 1st rows and the 0 and 1st column. 它可以,但问题是如何指定你想要坐标而不是不同的轴,例如
df.iloc[[0,1],[0,1]]
意味着给我0和第1行以及0和1列。
That said, you can do this: 也就是说,你可以这样做:
You updated the question and say you want to start with the index values 您更新了问题并说您想要从索引值开始
In [19]: row_indexer = df.index.get_indexer([Period('2011'),Period('2012')])
In [20]: col_indexer = df.columns.get_indexer([0,1])
In [21]: z = np.zeros(df.shape,dtype=bool)
In [22]: z[row_indexer,col_indexer] = True
In [23]: df.where(z)
Out[23]:
0 1 2
2011 0 NaN NaN
2012 NaN 4 NaN
2013 NaN NaN NaN
This seems easier though (these are the locations) 这似乎更容易(这些是位置)
In [63]: df.values[[0,1],[0,1]]
Out[63]: array([0, 4])
Or this; 或这个; as the Period index will be sliced correctly from the strings (don't use integers here)
因为Period索引将从字符串中正确切片(这里不使用整数)
In [26]: df.loc['2011',0]
Out[26]: 0
In [27]: df.loc['2012',1]
Out[27]: 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.