简体   繁体   English

熊猫中DataFrame的矩阵(位置)索引

[英]Matricial (positional) indexing of DataFrames in Pandas

Say I have the following dataframe: 说我有以下数据框:

tmp = np.random.randn(10,4)
df = pd.DataFrame(tmp, index=pd.date_range('1/1/2012', periods=tmp.shape[0]), 
                 columns=['A', 'B', 'C', 'D'])

> b
                   A         B         C         D
2012-01-01  0.471846  1.130041 -0.614117  0.882738
2012-01-02 -1.431566  0.680617 -0.615331  0.288740
2012-01-03  0.398567 -0.115388 -0.869855 -1.273666
2012-01-04  0.379501  0.192329 -1.942184  0.694004
2012-01-05  1.306329 -0.803856  0.417033 -0.655907
2012-01-06 -0.599877  0.696549 -0.252789  1.367977
2012-01-07 -1.618916  0.216571 -0.499880  0.386853
2012-01-08  0.415002  0.139775  0.251842  0.021379
2012-01-09  2.536787  0.737672 -0.740485 -0.890189
2012-01-10 -1.553530 -0.100950 -0.237478 -0.295612

How can I do: 我能怎么做:

  1. Positional indexing of specific rows/columns? 特定行/列的位置索引? (and get the corresponding sub- dataframe ) (并获得相应的子数据帧
  2. Positional indexing of ranges of rows/columns? 行/列范围的位置索引? (and get the corresponding sub- dataframe ) (并获得相应的子数据帧

For single-entry matricial indexing: 对于单项矩阵索引:

For example, say I want to index the sub-dataframe in location [1,2] (in numpy "matricial" notation). 例如,假设我要在位置[1,2] (以numpy“矩阵”表示法)中索引子数据帧。 The output should be: 输出应为:

                   C
2012-01-02 -0.615331

I tried the following three methods, but none of them worked:: 我尝试了以下三种方法,但是它们都不起作用:

df[1,2]
df[1][2]
df.take([1])[2]

The only methods that work seem to be: 起作用的唯一方法似乎是:

df.ix[1,2]
df.irow(1)[2]

but: 但:

  • Using .ix for positional indexing is dangerous, since it would default to label indexing if my indices were integers (as opposed to dates as in the case above). 使用.ix进行位置索引是很危险的,因为如果我的索引是整数(与上述情况中的日期相反),它将默认标记为索引 See more on this here: Start:stop slicing inconsistencies between numpy and Pandas? 在此处查看更多信息: 开始:停止在numpy和Pandas之间切片不一致? .

  • Using irow is cumbersome, since it requires switching from () notation to [] notation ( irow returns a Series object) 使用irow很麻烦,因为它需要从()表示法切换为[]表示法( irow返回Series对象)

For range matricial indexing: 对于范围矩阵索引:

For example, say I want to index elements in locations [1:3,2:3] in (numpy matricial notation). 例如,假设我要在(numpy矩阵表示法)的位置[1:3,2:3]中索引元素。 The output should be: 输出应为:

                   B
2012-01-02 -0.615331  
2012-01-03 -0.869855 

Note that I am excluding the stop indices (ie I am sticking to the numpy notation). 请注意,我排除停止索引 (即我坚持使用numpy表示法)。

Any thoughts? 有什么想法吗?

经常会要求使用此功能, https://github.com/pydata/pandas/pull/2922如果您想对其进行测试,可以将其从分支中拉出

Here is a workaround (until the feature request @Jeff mentioned gets committed): 这是一种解决方法(直到提交了@Jeff提到的功能请求):

In [178]: df = pd.DataFrame(tmp, index=pd.date_range('2012-1-1', periods=tmp.shape[0]), columns='A B C D'.split())

In [179]: df.ix[df.index[1], df.columns[2]]
Out[179]: -0.3021434106214243

In [180]: df.ix[df.index[1:3], df.columns[2:3]]
Out[180]: 
                   C
2012-01-02 -0.302143
2012-01-03 -1.430387

This shows the syntax works the same way even with shuffled integer indices: 这表明语法即使以随机整数索引的形式也以相同的方式工作:

In [206]: df2 = df.reset_index(drop=True)

In [207]: index = range(10)

In [208]: import random

In [209]: random.shuffle(index)

In [210]: df2.index = index

In [212]: df2.ix[df2.index[1], df2.columns[2]]
Out[212]: -0.3021434106214243

In [213]: df2.ix[df2.index[1:3], df2.columns[2:3]]
Out[213]: 
          C
7 -0.302143
2 -1.430387

from the pandas documentation: 从熊猫文档中:

Pandas provides a suite of methods in order to get purely integer based indexing. Pandas提供了一组方法来获得纯粹基于整数的索引。 The semantics follow closely python and numpy slicing. 语义紧随python和numpy切片。 These are 0-based indexing. 这些是基于0的索引。 When slicing, the start bounds is included, while the upper bound is excluded. 切片时,包括开始边界,但不包括上限。 Trying to use a non-integer, even a valid label will raise a IndexError. 尝试使用非整数,即使有效标签也将引发IndexError。

The .iloc attribute is the primary access method. .iloc属性是主要的访问方法。 The following are valid inputs: 以下是有效输入:

An integer eg 5 A list or array of integers [4, 3, 0] A slice object with ints 1:7 整数,例如5 A整数列表或数组[4,3,0]整数为1:7的切片对象

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM