Pandas：根据条件用另一个数据帧值替换数据帧中的值

Question

I have the following dataframes:我有以下数据框：

df1 : df1 ：

+-----+------+------+------+------+------+
| No. | col1 | col2 | col3 | Type | ...  |
+-----+------+------+------+------+------+
| 123 |    2 |    5 |    2 | MN   | ...  |
| 453 |    4 |    3 |    1 | MN   | ...  |
| 146 |    7 |    9 |    4 | AA   | ...  |
| 175 |    2 |    4 |    3 | MN   | ...  |
| 643 |    0 |    0 |    0 | NAN  | ...  |
+-----+------+------+------+------+------+

df2 : df2 ：

+-----+------+------+------+------+
| No. | col1 | col2 | col3 | Type |
+-----+------+------+------+------+
| 123 |   24 |   57 |   22 | MN   |
| 453 |   41 |   39 |   15 | MN   |
| 175 |   21 |   43 |   37 | MN   |
+-----+------+------+------+------+

I want to replace col1 , col2 and col3 in df1 with corresponding values in df2 if Type equals MN如果Type等于MN ，我想用df2相应值替换df1 col1 、 col2和col3

Desired output:期望的输出：

df1 : df1 ：

+-----+------+------+------+------+-----+
| No. | col1 | col2 | col3 | Type | ... |
+-----+------+------+------+------+-----+
| 123 |   24 |   57 |   22 | MN   | ... |
| 453 |   41 |   39 |   15 | MN   | ... |
| 146 |    7 |    9 |    4 | AA   | ... |
| 175 |   21 |   43 |   37 | MN   | ... |
| 643 |    0 |    0 |    0 | NAN  | ... |
+-----+------+------+------+------+-----+

EDIT编辑

I tried:我试过：

df1[df1.Type == 'MN'] = df2.values

but I get this error:但我收到此错误：

ValueError: Must have equal len keys and value when setting with an ndarray

Guess the reason is, that df2 does not have equal number of columns.猜猜原因是， df2的列数不同。 So how do I make sure, that only the specific columns ( col1 - col3 ) are replaced in df1 ?那么我如何确保只有特定的列（ col1 - col3 ）在df1被替换？

Answer 1

I think need combine_first for match by No. column:我认为需要combine_first匹配No.列：

#filter only `MN` rows if necessary
df22 = df2[df2['Type'] == 'MN'].set_index('No.')
df1 = df22.combine_first(df1.set_index('No.')).reset_index().reindex(columns=df1.columns)
print (df1)

   No.  col1  col2  col3 Type  col
0  123  24.0  57.0  22.0   MN  ...
1  146   7.0   9.0   4.0   AA  ...
2  175  21.0  43.0  37.0   MN  ...
3  453  41.0  39.0  15.0   MN  ...
4  643   0.0   0.0   0.0  NAN  ...

Answer 2

Your code doesn't work because the number of columns of df1 and df2 are different.您的代码不起作用，因为df1和df2的列数不同。

from io import StringIO
import pandas as pd

x1 = """No.,col1,col2,col3,Type,Oth
123,2,5,2,MN,...
453,4,3,1,MN,...
146,7,9,4,AA,...
175,2,4,3,MN,...
643,0,0,0,NAN,...
"""
x2 = """No.,col1,col2,col3,Type
123,24,57,22,MN
453,41,39,15,MN
175,21,43,37,MN
"""

df1 = pd.read_csv(StringIO(x1), sep=",")
df2 = pd.read_csv(StringIO(x2), sep=",")

df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = df2.values
# Output:
# >>> print(df1)
#    No.  col1  col2  col3 Type  Oth
# 0  123    24    57    22   MN  ...
# 1  453    41    39    15   MN  ...
# 2  146     7     9     4   AA  ...
# 3  175    21    43    37   MN  ...
# 4  643     0     0     0  NAN  ...

But there is a problem if the column order of df1 and df2 are different.但是如果df1和df2的列顺序不同，就会出现问题。

df1 = pd.read_csv(StringIO(x1), sep=",")
df3 = df2.copy()[["No.","Type","col1","col2","col3"]]
df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = df3.values
# Output: 
# >>> print(df1)
#    No. col1  col2  col3 Type  Oth
# 0  123   MN    24    57   22  ...
# 1  453   MN    41    39   15  ...
# 2  146    7     9     4   AA  ...
# 3  175   MN    21    43   37  ...
# 4  643    0     0     0  NAN  ...

To avoid this, you can try为避免这种情况，您可以尝试

df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = (
    df3[["No.","col1","col2","col3","Type"]].values)
# Output:
# >>> print(df1)
#    No.  col1  col2  col3 Type  Oth
# 0  123    24    57    22   MN  ...
# 1  453    41    39    15   MN  ...
# 2  146     7     9     4   AA  ...
# 3  175    21    43    37   MN  ...
# 4  643     0     0     0  NAN  ...

However, there is still a problem if the number of 'MN' records are different in df1 and df2但是，如果df1和df2中'MN'记录的数量不同，仍然存在问题

df1 = pd.read_csv(StringIO(x1), sep=",")
df4 = df2.copy().iloc[:2]
df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = (
    df4[["No.","col1","col2","col3","Type"]].values)
# Error: 
# ValueError: shape mismatch: value array of shape (2,) could not be broadcast to 
# indexing result of shape (3,)

So what you need may be like this所以你需要的可能是这样的

df = pd.merge(df1, df2, how='left', on=['No.', 'Type'])
df['col1'] = df.apply(lambda x: x.col1_y if x.Type == 'MN' else x.col1_x, axis=1)
df['col2'] = df.apply(lambda x: x.col2_y if x.Type == 'MN' else x.col2_x, axis=1)
df['col3'] = df.apply(lambda x: x.col3_y if x.Type == 'MN' else x.col3_x, axis=1)
df = df[["No.","col1","col2","col3","Type"]]
# Output:
#>>> print(df)
#   No.  col1  col2  col3 Type
#0  123  24.0  57.0  22.0   MN
#1  453  41.0  39.0  15.0   MN
#2  146   7.0   9.0   4.0   AA
#3  175  21.0  43.0  37.0   MN
#4  643   0.0   0.0   0.0  NAN

Pandas：根据条件用另一个数据帧值替换数据帧中的值

问题描述

2 个解决方案

解决方案1
2 已采纳 2018-05-14 07:14:59

解决方案2
0 2018-05-14 07:49:33

Pandas：根据条件用另一个数据帧值替换数据帧中的值

问题描述

2 个解决方案

解决方案1 2 已采纳 2018-05-14 07:14:59

解决方案2 0 2018-05-14 07:49:33

解决方案1
2 已采纳 2018-05-14 07:14:59

解决方案2
0 2018-05-14 07:49:33