根据其他值将数据框中的值添加到行

Question

I'm looking for a performant way to lookup a value in a dataframe based on another value, and add the lookup value to a column in the row with the other value.我正在寻找一种基于另一个值在数据框中查找值的高性能方法，并将查找值添加到具有另一个值的行中的列中。

For example, I have this dataframe:例如，我有这个数据框：

import pandas as pd

data = {
    'role': ['primary', 'secondary', 'primary', 'secondary'],
    'serial_number': ['abc', '123', 'def', '456'],
    'primary_serial_number': ['abc', 'abc', 'def', 'def'],
    'physical_id': ['w', 'x', 'y', 'z'],
    'set_id': ['j', 'x', 'k', 'z']
}
df = pd.DataFrame(data = data)

    role    serial_number   primary_serial_number   physical_id set_id
0   primary     abc                  abc                  w       j
1   secondary   123                  abc                  x       x
2   primary     def                  def                  y       k
3   secondary   456                  def                  z       z

Secondaries always have the same physical_id and set_id.辅助节点始终具有相同的 physical_id 和 set_id。 For each secondary, I'd like to have the set_id of the relevant primary in the same row as the secondary.对于每个辅助节点，我希望将相关主节点的 set_id 与辅助节点位于同一行。 I can look this up by matching the "primary_serial_number" for each secondary to the "serial_number" for each primary.我可以通过将每个次要的“primary_serial_number”与每个主要的“serial_number”进行匹配来查找。 I should then have a column labeled "primary_set_id" that has the values j, j, k, k.然后我应该有一个标记为“primary_set_id”的列，其中包含值 j、j、k、k。

I tried the following:我尝试了以下方法：

df['primary_set_id'] = df['primary_serial_number'].apply(
    lambda x: df['set_id'][df['serial_number'] == x])

When I run this on the above fake data, I get:当我在上面的假数据上运行这个时，我得到：

ValueError: Wrong number of items passed 2, placement implies 1 ValueError: 错误数量的项目通过 2，放置意味着 1

In reality, I am dealing with hundreds of thousands of rows, and this method is extremely inefficient (I have not yet let it run to completion).实际上，我正在处理数十万行，这种方法效率极低（我还没有让它运行完成）。

Answer 1

I think this should do it我认为这应该这样做

grps = df.groupby('role')
prim_df = grps.get_group('primary')
sec_df = grps.get_group('secondary')
primsec_df = sec_df.merge(prim_df, left_on = 'primary_serial_number', right_on = 'serial_number')
primsec_df

in column 'sec_id_y' you get what you want:在'sec_id_y'列中，您会得到您想要的：

|    | role_x    |   serial_number_x | primary_serial_number_x   | physical_id_x   | set_id_x   | role_y   | serial_number_y   | primary_serial_number_y   | physical_id_y   | set_id_y   |
|---:|:----------|------------------:|:--------------------------|:----------------|:-----------|:---------|:------------------|:--------------------------|:----------------|:-----------|
|  0 | secondary |               123 | abc                       | x               | x          | primary  | abc               | abc                       | w               | j          |
|  1 | secondary |               456 | def                       | z               | z          | primary  | def               | def                       | y               | k          |

I am not sure how efficient this will be on a large df我不确定这在大型 df 上的效率如何

根据其他值将数据框中的值添加到行

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-11-21 17:00:14

根据其他值将数据框中的值添加到行

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-11-21 17:00:14

解决方案1
0 已采纳 2020-11-21 17:00:14