[英]Using .to_numpy() to copy specific columns from one row of Pandas Dataframe to another
I have a Dataframe like this:我有一个这样的数据框:
UniqueID CST WEIGHT VOLUME PRODUCTIVITY
0 413-20012 3 123 12 1113
1 413-45365 1 889 75 6748
2 413-21165 8 554 13 4536
3 413-24354 1 387 35 7649
4 413-34658 2 121 88 2468
5 413-36889 4 105 76 3336
6 413-23457 5 355 42 7894
7 413-30089 5 146 10 9112
8 413-41158 5 453 91 4545
9 413-51015 9 654 66 2232
And I have a dictionary of parent:child mappings for the UniqueID's:我有一本关于 UniqueID 的 parent:child 映射字典:
parent_child_dict = {
'413-51015': '413-41158',
'413-21165': '413-23457',
'413-45365': '413-41158',
'413-20012': '413-23457'
}
What I need to do is loop through the Dataframe, and replace the WEIGHT, VOLUME, and PRODUCTIVITY values of the 'child' UniqueID row with the values from the 'parent' UniqueID row, where resulting Dataframe would look like this:我需要做的是循环遍历数据框,并将“子”UniqueID 行的 WEIGHT、VOLUME 和 PRODUCTIVITY 值替换为“父”UniqueID 行中的值,其中生成的 Dataframe 如下所示:
UniqueID CST WEIGHT VOLUME PRODUCTIVITY
0 413-20012 3 355 42 7894
1 413-45365 1 453 91 4545
2 413-21165 8 355 42 7894
3 413-24354 1 387 35 7649
4 413-34658 2 121 88 2468
5 413-36889 4 105 76 3336
6 413-23457 5 355 42 7894
7 413-30089 5 146 10 9112
8 413-41158 5 453 91 4545
9 413-51015 9 453 91 4545
I've experimented with several possible solutions, and the trouble I'm having is limiting the copy in such a way that the UniqueID and the CST values of the 'child' row are preserved, but the other values are copied over.我已经尝试了几种可能的解决方案,我遇到的问题是限制副本的方式是保留“子”行的 UniqueID 和 CST 值,但复制其他值。
The closest I've gotten is a loop through the dictionary where each pairing gets fed into this:我得到的最接近的是通过字典的循环,其中每个配对都被输入:
df.loc[df['UniqueID'] == '413-51015'] = df.loc[df['UniqueID'] == '413-41158'].to_numpy()
This seems to nicely replace all values from one row to another.这似乎很好地将所有值从一行替换为另一行。
Any help on the exceptions or a better solution overall would be extremely helpful.任何有关例外情况的帮助或更好的整体解决方案都会非常有帮助。 Thank you.谢谢你。
EDIT编辑
I've looped the first solution into the columns that I want changed in the dataset like this:我已经将第一个解决方案循环到我想要在数据集中更改的列中,如下所示:
columns = []
for col in df.columns:
columns.append(col)
remove_perm = columns.remove('UniqueID')
remove_perm = columns.remove('CST')
print(columns)
OUTPUT输出
['WEIGHT', 'VOLUME', 'PRODUCTIVITY']
Then然后
for col in columns:
s = df[['UniqueID', col]].set_index('UniqueID')
df[col] = s.loc[df['UniqueID'].replace(parent_child_dict)].to_numpy()
This has resulted in the desired dataset.这导致了所需的数据集。
replace
and loc
access: replace
和loc
访问:
s = df[['UniqueID','PRODUCTIVITY']].set_index('UniqueID')
# using to_numpy here :-)
df['PRODUCTIVITY'] = s.loc[df['UniqueID'].replace(parent_child_dict)].to_numpy()
Output:输出:
UniqueID CST WEIGHT VOLUME PRODUCTIVITY
0 413-20012 3 123 12 7894
1 413-45365 1 889 75 4545
2 413-21165 8 554 13 7894
3 413-24354 1 387 35 7649
4 413-34658 2 121 88 2468
5 413-36889 4 105 76 3336
6 413-23457 5 355 42 7894
7 413-30089 5 146 10 9112
8 413-41158 5 453 91 4545
9 413-51015 9 654 66 4545
First create a mapping out of your UniqueID
and PRODUCTIVITY
.首先根据您的UniqueID
和PRODUCTIVITY
创建一个映射。
Then use your parent child to map your ids:然后使用您的父子映射您的 ID:
mapping = df.set_index('UniqueID')['PRODUCTIVITY'].to_dict()
df['PRODUCTIVITY'] = (
df['UniqueID'].map(parent_child_dict).map(mapping).fillna(df['PRODUCTIVITY']).astype(int)
)
print(df)
UniqueID CST WEIGHT VOLUME PRODUCTIVITY
0 413-20012 3 123 12 7894
1 413-45365 1 889 75 4545
2 413-21165 8 554 13 7894
3 413-24354 1 387 35 7649
4 413-34658 2 121 88 2468
5 413-36889 4 105 76 3336
6 413-23457 5 355 42 7894
7 413-30089 5 146 10 9112
8 413-41158 5 453 91 4545
9 413-51015 9 654 66 4545
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.