I'm looking for a performant way to lookup a value in a dataframe based on another value, and add the lookup value to a column in the row with the other value.
For example, I have this dataframe:
import pandas as pd
data = {
'role': ['primary', 'secondary', 'primary', 'secondary'],
'serial_number': ['abc', '123', 'def', '456'],
'primary_serial_number': ['abc', 'abc', 'def', 'def'],
'physical_id': ['w', 'x', 'y', 'z'],
'set_id': ['j', 'x', 'k', 'z']
}
df = pd.DataFrame(data = data)
role serial_number primary_serial_number physical_id set_id
0 primary abc abc w j
1 secondary 123 abc x x
2 primary def def y k
3 secondary 456 def z z
Secondaries always have the same physical_id and set_id. For each secondary, I'd like to have the set_id of the relevant primary in the same row as the secondary. I can look this up by matching the "primary_serial_number" for each secondary to the "serial_number" for each primary. I should then have a column labeled "primary_set_id" that has the values j, j, k, k.
I tried the following:
df['primary_set_id'] = df['primary_serial_number'].apply(
lambda x: df['set_id'][df['serial_number'] == x])
When I run this on the above fake data, I get:
ValueError: Wrong number of items passed 2, placement implies 1
In reality, I am dealing with hundreds of thousands of rows, and this method is extremely inefficient (I have not yet let it run to completion).
I think this should do it
grps = df.groupby('role')
prim_df = grps.get_group('primary')
sec_df = grps.get_group('secondary')
primsec_df = sec_df.merge(prim_df, left_on = 'primary_serial_number', right_on = 'serial_number')
primsec_df
in column 'sec_id_y'
you get what you want:
| | role_x | serial_number_x | primary_serial_number_x | physical_id_x | set_id_x | role_y | serial_number_y | primary_serial_number_y | physical_id_y | set_id_y |
|---:|:----------|------------------:|:--------------------------|:----------------|:-----------|:---------|:------------------|:--------------------------|:----------------|:-----------|
| 0 | secondary | 123 | abc | x | x | primary | abc | abc | w | j |
| 1 | secondary | 456 | def | z | z | primary | def | def | y | k |
I am not sure how efficient this will be on a large df
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.