简体   繁体   English

熊猫:迭代已经排序的列的唯一值

[英]Pandas: iterate over unique values of a column that is already in sorted order

I have constructed a pandas data frame in sorted order and would like to iterate over groups having identical values of a particular column. 我已经按排序的顺序构造了一个熊猫数据框,并希望遍历具有相同特定列值的组。 It seems to me that the groupby functionality is useful for this, but as far as I can tell performing groupby does not give any guarantee about the order of the key. 在我看来,groupby功能对此很有用,但是据我所知,执行groupby不能保证键的顺序。 How can I extract the unqiue column values in sorted order. 如何按排序顺序提取unueue列值。

Here is an example data frame: 这是一个示例数据帧:

Foo,1
Foo,2
Bar,2
Bar,1

I would like a list ["Foo","Bar"] where the order is guaranteed by the order of the original data frame. 我想要一个列表[“ Foo”,“ Bar”],其中的顺序由原始数据帧的顺序来保证。 I can then use this list to extract appropriate rows. 然后,我可以使用此列表提取适当的行。 The sort is actually defined in my case by columns that are also given in the data frame (not included in the example above) and so a solution that re-sorts will be acceptable if the information can not be pulled out directly. 在我的情况下,排序实际上是由数据帧中也提供的列定义的(上面的示例中未包括),因此,如果无法直接提取信息,则可以采用重新排序的解决方案。

As mentioned in the comments, you can use unique on the column which will preserve the order (unlike numpy's unique, it doesn't sort): 如评论中所述,您可以在将保留顺序的列上使用唯一键(与numpy的唯一键不同,它不会排序):

In [11]: df
Out[11]: 
     0  1
0  Foo  1
1  Foo  2
2  Bar  2
3  Bar  1

In [12]: df[0].unique()
Out[12]: array(['Foo', 'Bar'], dtype=object)

Then you can access the relevant rows using groupby's get_group : 然后,您可以使用groupby的get_group访问相关的行:

In [13]: g = df.groupby([0])

In [14]: g.get_group('Foo')
Out[14]: 
     0  1
0  Foo  1
1  Foo  2    

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM