[英]Pandas: replace values in dataframe with another dataframes values based on condition
I have the following dataframes:我有以下数据框:
df1
: df1
:
+-----+------+------+------+------+------+
| No. | col1 | col2 | col3 | Type | ... |
+-----+------+------+------+------+------+
| 123 | 2 | 5 | 2 | MN | ... |
| 453 | 4 | 3 | 1 | MN | ... |
| 146 | 7 | 9 | 4 | AA | ... |
| 175 | 2 | 4 | 3 | MN | ... |
| 643 | 0 | 0 | 0 | NAN | ... |
+-----+------+------+------+------+------+
df2
: df2
:
+-----+------+------+------+------+
| No. | col1 | col2 | col3 | Type |
+-----+------+------+------+------+
| 123 | 24 | 57 | 22 | MN |
| 453 | 41 | 39 | 15 | MN |
| 175 | 21 | 43 | 37 | MN |
+-----+------+------+------+------+
I want to replace col1
, col2
and col3
in df1
with corresponding values in df2
if Type
equals MN
如果
Type
等于MN
,我想用df2
相应值替换df1
col1
、 col2
和col3
Desired output:期望的输出:
df1
: df1
:
+-----+------+------+------+------+-----+
| No. | col1 | col2 | col3 | Type | ... |
+-----+------+------+------+------+-----+
| 123 | 24 | 57 | 22 | MN | ... |
| 453 | 41 | 39 | 15 | MN | ... |
| 146 | 7 | 9 | 4 | AA | ... |
| 175 | 21 | 43 | 37 | MN | ... |
| 643 | 0 | 0 | 0 | NAN | ... |
+-----+------+------+------+------+-----+
EDIT编辑
I tried:我试过:
df1[df1.Type == 'MN'] = df2.values
but I get this error:但我收到此错误:
ValueError: Must have equal len keys and value when setting with an ndarray
Guess the reason is, that df2
does not have equal number of columns.猜猜原因是,
df2
的列数不同。 So how do I make sure, that only the specific columns ( col1
- col3
) are replaced in df1
?那么我如何确保只有特定的列(
col1
- col3
)在df1
被替换?
I think need combine_first
for match by No.
column:我认为需要
combine_first
匹配No.
列:
#filter only `MN` rows if necessary
df22 = df2[df2['Type'] == 'MN'].set_index('No.')
df1 = df22.combine_first(df1.set_index('No.')).reset_index().reindex(columns=df1.columns)
print (df1)
No. col1 col2 col3 Type col
0 123 24.0 57.0 22.0 MN ...
1 146 7.0 9.0 4.0 AA ...
2 175 21.0 43.0 37.0 MN ...
3 453 41.0 39.0 15.0 MN ...
4 643 0.0 0.0 0.0 NAN ...
Your code doesn't work because the number of columns of df1
and df2
are different.您的代码不起作用,因为
df1
和df2
的列数不同。
from io import StringIO
import pandas as pd
x1 = """No.,col1,col2,col3,Type,Oth
123,2,5,2,MN,...
453,4,3,1,MN,...
146,7,9,4,AA,...
175,2,4,3,MN,...
643,0,0,0,NAN,...
"""
x2 = """No.,col1,col2,col3,Type
123,24,57,22,MN
453,41,39,15,MN
175,21,43,37,MN
"""
df1 = pd.read_csv(StringIO(x1), sep=",")
df2 = pd.read_csv(StringIO(x2), sep=",")
df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = df2.values
# Output:
# >>> print(df1)
# No. col1 col2 col3 Type Oth
# 0 123 24 57 22 MN ...
# 1 453 41 39 15 MN ...
# 2 146 7 9 4 AA ...
# 3 175 21 43 37 MN ...
# 4 643 0 0 0 NAN ...
But there is a problem if the column order of df1
and df2
are different.但是如果
df1
和df2
的列顺序不同,就会出现问题。
df1 = pd.read_csv(StringIO(x1), sep=",")
df3 = df2.copy()[["No.","Type","col1","col2","col3"]]
df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = df3.values
# Output:
# >>> print(df1)
# No. col1 col2 col3 Type Oth
# 0 123 MN 24 57 22 ...
# 1 453 MN 41 39 15 ...
# 2 146 7 9 4 AA ...
# 3 175 MN 21 43 37 ...
# 4 643 0 0 0 NAN ...
To avoid this, you can try为避免这种情况,您可以尝试
df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = (
df3[["No.","col1","col2","col3","Type"]].values)
# Output:
# >>> print(df1)
# No. col1 col2 col3 Type Oth
# 0 123 24 57 22 MN ...
# 1 453 41 39 15 MN ...
# 2 146 7 9 4 AA ...
# 3 175 21 43 37 MN ...
# 4 643 0 0 0 NAN ...
However, there is still a problem if the number of 'MN' records are different in df1
and df2
但是,如果
df1
和df2
中'MN'记录的数量不同,仍然存在问题
df1 = pd.read_csv(StringIO(x1), sep=",")
df4 = df2.copy().iloc[:2]
df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = (
df4[["No.","col1","col2","col3","Type"]].values)
# Error:
# ValueError: shape mismatch: value array of shape (2,) could not be broadcast to
# indexing result of shape (3,)
So what you need may be like this所以你需要的可能是这样的
df = pd.merge(df1, df2, how='left', on=['No.', 'Type'])
df['col1'] = df.apply(lambda x: x.col1_y if x.Type == 'MN' else x.col1_x, axis=1)
df['col2'] = df.apply(lambda x: x.col2_y if x.Type == 'MN' else x.col2_x, axis=1)
df['col3'] = df.apply(lambda x: x.col3_y if x.Type == 'MN' else x.col3_x, axis=1)
df = df[["No.","col1","col2","col3","Type"]]
# Output:
#>>> print(df)
# No. col1 col2 col3 Type
#0 123 24.0 57.0 22.0 MN
#1 453 41.0 39.0 15.0 MN
#2 146 7.0 9.0 4.0 AA
#3 175 21.0 43.0 37.0 MN
#4 643 0.0 0.0 0.0 NAN
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.