I am using Python/Pandas and have dataframe (1) below. I have grouped this by ID, and then taken the max of the revision number in each group of revisions against each ID to produce series (2) below.
I now want to merge (1) into (2) in such a way as to match the first 2 columns of (1) with the corresponding columns of (2), pulling in the other column in (2) appropriately [in the actually data set of (1), 'id', 'revision' and 'colour' are not necessarily consecutive columns, and there are other columns].
I am essentially treating (2) as a key and pulling in appropriate data from (1).
How do I do this using Pandas?
Thanks in advance.
Max.
(1) Dataframe
ID Revision Colour
14446 0 red
14446 0 red
14446 0 red
14466 1 red
14466 1 red
14466 0 red
14466 1 red
14466 1 red
14466 0 red
14466 2 red
14466 0 red
14466 1 red
14466 0 red
14471 0 green
14471 0 green
14471 0 green
14471 0 green
14473 0 blue
14473 1 blue
14473 0 blue
(2) Series
ID Revision
13125 1
13213 0
13266 0
13276 0
13277 1
13278 0
13280 2
13285 0
13287 1
13288 0
13291 1
13292 1
Sort by revision, then group by ID and take the last element from each group.
In [2]: df.sort('Revision').groupby(level=0).last()
Out[2]:
Revision Colour
ID
14446 0 red
14466 2 red
14471 0 green
14473 1 blue
I assumed ID
is an index. If it's a column, groupby('ID')
instead.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.