[英]Merging a dataframe and a series on 2 columns using Pandas/Python
I am using Python/Pandas and have dataframe (1) below. 我正在使用Python / Pandas并在下面有数据框(1)。 I have grouped this by ID, and then taken the max of the revision number in each group of revisions against each ID to produce series (2) below.
我按ID对其进行了分组,然后根据每个ID对每组修订中的修订号进行了最大化,以生成下面的系列(2)。
I now want to merge (1) into (2) in such a way as to match the first 2 columns of (1) with the corresponding columns of (2), pulling in the other column in (2) appropriately [in the actually data set of (1), 'id', 'revision' and 'colour' are not necessarily consecutive columns, and there are other columns]. 我现在想要将(1)合并到(2)中,以便将(1)的前2列与(2)的相应列匹配,适当地拉入(2)中的另一列[实际上]数据集(1),'id','revision'和'color'不一定是连续的列,还有其他列]。
I am essentially treating (2) as a key and pulling in appropriate data from (1). 我基本上将(2)作为关键并从(1)中提取适当的数据。
How do I do this using Pandas? 我如何使用熊猫这样做?
Thanks in advance. 提前致谢。
Max. 最大。
(1) Dataframe (1)数据帧
ID Revision Colour
14446 0 red
14446 0 red
14446 0 red
14466 1 red
14466 1 red
14466 0 red
14466 1 red
14466 1 red
14466 0 red
14466 2 red
14466 0 red
14466 1 red
14466 0 red
14471 0 green
14471 0 green
14471 0 green
14471 0 green
14473 0 blue
14473 1 blue
14473 0 blue
(2) Series (2)系列
ID Revision
13125 1
13213 0
13266 0
13276 0
13277 1
13278 0
13280 2
13285 0
13287 1
13288 0
13291 1
13292 1
Sort by revision, then group by ID and take the last element from each group. 按修订排序,然后按ID分组,并从每个组中取出最后一个元素。
In [2]: df.sort('Revision').groupby(level=0).last()
Out[2]:
Revision Colour
ID
14446 0 red
14466 2 red
14471 0 green
14473 1 blue
I assumed ID
is an index. 我假设
ID
是一个索引。 If it's a column, groupby('ID')
instead. 如果它是一列,则使用
groupby('ID')
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.