简体   繁体   English

使用Pandas / Python在2列上合并数据帧和系列

[英]Merging a dataframe and a series on 2 columns using Pandas/Python

I am using Python/Pandas and have dataframe (1) below. 我正在使用Python / Pandas并在下面有数据框(1)。 I have grouped this by ID, and then taken the max of the revision number in each group of revisions against each ID to produce series (2) below. 我按ID对其进行了分组,然后根据每个ID对每组修订中的修订号进行了最大化,以生成下面的系列(2)。

I now want to merge (1) into (2) in such a way as to match the first 2 columns of (1) with the corresponding columns of (2), pulling in the other column in (2) appropriately [in the actually data set of (1), 'id', 'revision' and 'colour' are not necessarily consecutive columns, and there are other columns]. 我现在想要将(1)合并到(2)中,以便将(1)的前2列与(2)的相应列匹配,适当地拉入(2)中的另一列[实际上]数据集(1),'id','revision'和'color'不一定是连续的列,还有其他列]。

I am essentially treating (2) as a key and pulling in appropriate data from (1). 我基本上将(2)作为关键并从(1)中提取适当的数据。

How do I do this using Pandas? 我如何使用熊猫这样做?

Thanks in advance. 提前致谢。

Max. 最大。

(1) Dataframe (1)数据帧

ID         Revision Colour
14446   0   red
14446   0   red
14446   0   red
14466   1   red
14466   1   red
14466   0   red
14466   1   red
14466   1   red
14466   0   red
14466   2   red
14466   0   red
14466   1   red
14466   0   red
14471   0   green
14471   0   green
14471   0   green
14471   0   green
14473   0   blue
14473   1   blue
14473   0   blue

(2) Series (2)系列

ID                   Revision
13125                 1
13213                 0
13266                 0
13276                 0
13277                 1
13278                 0
13280                 2
13285                 0
13287                 1
13288                 0
13291                 1
13292                 1

Sort by revision, then group by ID and take the last element from each group. 按修订排序,然后按ID分组,并从每个组中取出最后一个元素。

In [2]: df.sort('Revision').groupby(level=0).last()
Out[2]: 
       Revision Colour
ID                    
14446         0    red
14466         2    red
14471         0  green
14473         1   blue

I assumed ID is an index. 我假设ID是一个索引。 If it's a column, groupby('ID') instead. 如果它是一列,则使用groupby('ID')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM