[英]How to replace pandas dataframe values based on lookup values in another dataframe?
[英]Pandas: replace values in dataframe with another dataframes values based on condition
我有以下数据框:
df1
:
+-----+------+------+------+------+------+
| No. | col1 | col2 | col3 | Type | ... |
+-----+------+------+------+------+------+
| 123 | 2 | 5 | 2 | MN | ... |
| 453 | 4 | 3 | 1 | MN | ... |
| 146 | 7 | 9 | 4 | AA | ... |
| 175 | 2 | 4 | 3 | MN | ... |
| 643 | 0 | 0 | 0 | NAN | ... |
+-----+------+------+------+------+------+
df2
:
+-----+------+------+------+------+
| No. | col1 | col2 | col3 | Type |
+-----+------+------+------+------+
| 123 | 24 | 57 | 22 | MN |
| 453 | 41 | 39 | 15 | MN |
| 175 | 21 | 43 | 37 | MN |
+-----+------+------+------+------+
如果Type
等于MN
,我想用df2
相应值替换df1
col1
、 col2
和col3
期望的输出:
df1
:
+-----+------+------+------+------+-----+
| No. | col1 | col2 | col3 | Type | ... |
+-----+------+------+------+------+-----+
| 123 | 24 | 57 | 22 | MN | ... |
| 453 | 41 | 39 | 15 | MN | ... |
| 146 | 7 | 9 | 4 | AA | ... |
| 175 | 21 | 43 | 37 | MN | ... |
| 643 | 0 | 0 | 0 | NAN | ... |
+-----+------+------+------+------+-----+
编辑
我试过:
df1[df1.Type == 'MN'] = df2.values
但我收到此错误:
ValueError: Must have equal len keys and value when setting with an ndarray
猜猜原因是, df2
的列数不同。 那么我如何确保只有特定的列( col1
- col3
)在df1
被替换?
我认为需要combine_first
匹配No.
列:
#filter only `MN` rows if necessary
df22 = df2[df2['Type'] == 'MN'].set_index('No.')
df1 = df22.combine_first(df1.set_index('No.')).reset_index().reindex(columns=df1.columns)
print (df1)
No. col1 col2 col3 Type col
0 123 24.0 57.0 22.0 MN ...
1 146 7.0 9.0 4.0 AA ...
2 175 21.0 43.0 37.0 MN ...
3 453 41.0 39.0 15.0 MN ...
4 643 0.0 0.0 0.0 NAN ...
您的代码不起作用,因为df1
和df2
的列数不同。
from io import StringIO
import pandas as pd
x1 = """No.,col1,col2,col3,Type,Oth
123,2,5,2,MN,...
453,4,3,1,MN,...
146,7,9,4,AA,...
175,2,4,3,MN,...
643,0,0,0,NAN,...
"""
x2 = """No.,col1,col2,col3,Type
123,24,57,22,MN
453,41,39,15,MN
175,21,43,37,MN
"""
df1 = pd.read_csv(StringIO(x1), sep=",")
df2 = pd.read_csv(StringIO(x2), sep=",")
df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = df2.values
# Output:
# >>> print(df1)
# No. col1 col2 col3 Type Oth
# 0 123 24 57 22 MN ...
# 1 453 41 39 15 MN ...
# 2 146 7 9 4 AA ...
# 3 175 21 43 37 MN ...
# 4 643 0 0 0 NAN ...
但是如果df1
和df2
的列顺序不同,就会出现问题。
df1 = pd.read_csv(StringIO(x1), sep=",")
df3 = df2.copy()[["No.","Type","col1","col2","col3"]]
df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = df3.values
# Output:
# >>> print(df1)
# No. col1 col2 col3 Type Oth
# 0 123 MN 24 57 22 ...
# 1 453 MN 41 39 15 ...
# 2 146 7 9 4 AA ...
# 3 175 MN 21 43 37 ...
# 4 643 0 0 0 NAN ...
为避免这种情况,您可以尝试
df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = (
df3[["No.","col1","col2","col3","Type"]].values)
# Output:
# >>> print(df1)
# No. col1 col2 col3 Type Oth
# 0 123 24 57 22 MN ...
# 1 453 41 39 15 MN ...
# 2 146 7 9 4 AA ...
# 3 175 21 43 37 MN ...
# 4 643 0 0 0 NAN ...
但是,如果df1
和df2
中'MN'记录的数量不同,仍然存在问题
df1 = pd.read_csv(StringIO(x1), sep=",")
df4 = df2.copy().iloc[:2]
df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = (
df4[["No.","col1","col2","col3","Type"]].values)
# Error:
# ValueError: shape mismatch: value array of shape (2,) could not be broadcast to
# indexing result of shape (3,)
所以你需要的可能是这样的
df = pd.merge(df1, df2, how='left', on=['No.', 'Type'])
df['col1'] = df.apply(lambda x: x.col1_y if x.Type == 'MN' else x.col1_x, axis=1)
df['col2'] = df.apply(lambda x: x.col2_y if x.Type == 'MN' else x.col2_x, axis=1)
df['col3'] = df.apply(lambda x: x.col3_y if x.Type == 'MN' else x.col3_x, axis=1)
df = df[["No.","col1","col2","col3","Type"]]
# Output:
#>>> print(df)
# No. col1 col2 col3 Type
#0 123 24.0 57.0 22.0 MN
#1 453 41.0 39.0 15.0 MN
#2 146 7.0 9.0 4.0 AA
#3 175 21.0 43.0 37.0 MN
#4 643 0.0 0.0 0.0 NAN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.