简体   繁体   中英

Merging a dataframe and a series on 2 columns using Pandas/Python

I am using Python/Pandas and have dataframe (1) below. I have grouped this by ID, and then taken the max of the revision number in each group of revisions against each ID to produce series (2) below.

I now want to merge (1) into (2) in such a way as to match the first 2 columns of (1) with the corresponding columns of (2), pulling in the other column in (2) appropriately [in the actually data set of (1), 'id', 'revision' and 'colour' are not necessarily consecutive columns, and there are other columns].

I am essentially treating (2) as a key and pulling in appropriate data from (1).

How do I do this using Pandas?

Thanks in advance.

Max.

(1) Dataframe

ID         Revision Colour
14446   0   red
14446   0   red
14446   0   red
14466   1   red
14466   1   red
14466   0   red
14466   1   red
14466   1   red
14466   0   red
14466   2   red
14466   0   red
14466   1   red
14466   0   red
14471   0   green
14471   0   green
14471   0   green
14471   0   green
14473   0   blue
14473   1   blue
14473   0   blue

(2) Series

ID                   Revision
13125                 1
13213                 0
13266                 0
13276                 0
13277                 1
13278                 0
13280                 2
13285                 0
13287                 1
13288                 0
13291                 1
13292                 1

Sort by revision, then group by ID and take the last element from each group.

In [2]: df.sort('Revision').groupby(level=0).last()
Out[2]: 
       Revision Colour
ID                    
14446         0    red
14466         2    red
14471         0  green
14473         1   blue

I assumed ID is an index. If it's a column, groupby('ID') instead.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM