简体   繁体   English

Pandas:根据条件用另一个数据帧值替换数据帧中的值

[英]Pandas: replace values in dataframe with another dataframes values based on condition

I have the following dataframes:我有以下数据框:

df1 : df1

+-----+------+------+------+------+------+
| No. | col1 | col2 | col3 | Type | ...  |
+-----+------+------+------+------+------+
| 123 |    2 |    5 |    2 | MN   | ...  |
| 453 |    4 |    3 |    1 | MN   | ...  |
| 146 |    7 |    9 |    4 | AA   | ...  |
| 175 |    2 |    4 |    3 | MN   | ...  |
| 643 |    0 |    0 |    0 | NAN  | ...  |
+-----+------+------+------+------+------+

df2 : df2

+-----+------+------+------+------+
| No. | col1 | col2 | col3 | Type |
+-----+------+------+------+------+
| 123 |   24 |   57 |   22 | MN   |
| 453 |   41 |   39 |   15 | MN   |
| 175 |   21 |   43 |   37 | MN   |
+-----+------+------+------+------+

I want to replace col1 , col2 and col3 in df1 with corresponding values in df2 if Type equals MN如果Type等于MN ,我想用df2相应值替换df1 col1col2col3

Desired output:期望的输出:

df1 : df1

+-----+------+------+------+------+-----+
| No. | col1 | col2 | col3 | Type | ... |
+-----+------+------+------+------+-----+
| 123 |   24 |   57 |   22 | MN   | ... |
| 453 |   41 |   39 |   15 | MN   | ... |
| 146 |    7 |    9 |    4 | AA   | ... |
| 175 |   21 |   43 |   37 | MN   | ... |
| 643 |    0 |    0 |    0 | NAN  | ... |
+-----+------+------+------+------+-----+

EDIT编辑

I tried:我试过:

df1[df1.Type == 'MN'] = df2.values

but I get this error:但我收到此错误:

ValueError: Must have equal len keys and value when setting with an ndarray

Guess the reason is, that df2 does not have equal number of columns.猜猜原因是, df2的列数不同。 So how do I make sure, that only the specific columns ( col1 - col3 ) are replaced in df1 ?那么我如何确保只有特定的列( col1 - col3 )在df1被替换?

I think need combine_first for match by No. column:我认为需要combine_first匹配No.列:

#filter only `MN` rows if necessary
df22 = df2[df2['Type'] == 'MN'].set_index('No.')
df1 = df22.combine_first(df1.set_index('No.')).reset_index().reindex(columns=df1.columns)
print (df1)

   No.  col1  col2  col3 Type  col
0  123  24.0  57.0  22.0   MN  ...
1  146   7.0   9.0   4.0   AA  ...
2  175  21.0  43.0  37.0   MN  ...
3  453  41.0  39.0  15.0   MN  ...
4  643   0.0   0.0   0.0  NAN  ...

Your code doesn't work because the number of columns of df1 and df2 are different.您的代码不起作用,因为df1df2的列数不同。

from io import StringIO
import pandas as pd

x1 = """No.,col1,col2,col3,Type,Oth
123,2,5,2,MN,...
453,4,3,1,MN,...
146,7,9,4,AA,...
175,2,4,3,MN,...
643,0,0,0,NAN,...
"""
x2 = """No.,col1,col2,col3,Type
123,24,57,22,MN
453,41,39,15,MN
175,21,43,37,MN
"""

df1 = pd.read_csv(StringIO(x1), sep=",")
df2 = pd.read_csv(StringIO(x2), sep=",")

df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = df2.values
# Output:
# >>> print(df1)
#    No.  col1  col2  col3 Type  Oth
# 0  123    24    57    22   MN  ...
# 1  453    41    39    15   MN  ...
# 2  146     7     9     4   AA  ...
# 3  175    21    43    37   MN  ...
# 4  643     0     0     0  NAN  ...

But there is a problem if the column order of df1 and df2 are different.但是如果df1df2的列顺序不同,就会出现问题。

df1 = pd.read_csv(StringIO(x1), sep=",")
df3 = df2.copy()[["No.","Type","col1","col2","col3"]]
df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = df3.values
# Output: 
# >>> print(df1)
#    No. col1  col2  col3 Type  Oth
# 0  123   MN    24    57   22  ...
# 1  453   MN    41    39   15  ...
# 2  146    7     9     4   AA  ...
# 3  175   MN    21    43   37  ...
# 4  643    0     0     0  NAN  ...

To avoid this, you can try为避免这种情况,您可以尝试

df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = (
    df3[["No.","col1","col2","col3","Type"]].values)
# Output:
# >>> print(df1)
#    No.  col1  col2  col3 Type  Oth
# 0  123    24    57    22   MN  ...
# 1  453    41    39    15   MN  ...
# 2  146     7     9     4   AA  ...
# 3  175    21    43    37   MN  ...
# 4  643     0     0     0  NAN  ...

However, there is still a problem if the number of 'MN' records are different in df1 and df2但是,如果df1df2中'MN'记录的数量不同,仍然存在问题

df1 = pd.read_csv(StringIO(x1), sep=",")
df4 = df2.copy().iloc[:2]
df1.loc[df1.Type == 'MN', ["No.","col1","col2","col3","Type"]] = (
    df4[["No.","col1","col2","col3","Type"]].values)
# Error: 
# ValueError: shape mismatch: value array of shape (2,) could not be broadcast to 
# indexing result of shape (3,)

So what you need may be like this所以你需要的可能是这样的

df = pd.merge(df1, df2, how='left', on=['No.', 'Type'])
df['col1'] = df.apply(lambda x: x.col1_y if x.Type == 'MN' else x.col1_x, axis=1)
df['col2'] = df.apply(lambda x: x.col2_y if x.Type == 'MN' else x.col2_x, axis=1)
df['col3'] = df.apply(lambda x: x.col3_y if x.Type == 'MN' else x.col3_x, axis=1)
df = df[["No.","col1","col2","col3","Type"]]
# Output:
#>>> print(df)
#   No.  col1  col2  col3 Type
#0  123  24.0  57.0  22.0   MN
#1  453  41.0  39.0  15.0   MN
#2  146   7.0   9.0   4.0   AA
#3  175  21.0  43.0  37.0   MN
#4  643   0.0   0.0   0.0  NAN

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何根据另一个 dataframe 中的查找值替换 pandas dataframe 值? - How to replace pandas dataframe values based on lookup values in another dataframe? 如何根据条件将大熊猫数据框中某个范围内的值替换为同一数据框中的另一个值 - How to replace values in a range in a pandas dataframe with another value in the same dataframe based on a condition 如何根据另外两个数据帧的值填充 Pandas 数据帧 - How to fill the Pandas Dataframe based on values from another two dataframes 根据条件从另一个 dataframe 值替换列的值 - Python - Replace values of a column from another dataframe values based on a condition - Python 根据 Pandas 中的条件将多行的值替换为另一行的值 - Replace the values of multiple rows with the values of another row based on a condition in Pandas 根据另一列的值替换Pandas数据框的Column的值 - Replace values of a Pandas dataframe's Column based on values of another column 根据条件,用相应的列名替换 pandas 数据框中的特定值, - Replace specific values in pandas dataframe with the corresponding column name, based on a condition, 根据条件用不同的替换字典替换熊猫数据框列中的值 - Replace values in pandas dataframe column with different replacement dict based on condition Pandas DataFrame:根据条件替换列中的所有值 - Pandas DataFrame: replace all values in a column, based on condition 将python pandas df替换为基于条件的第二个数据帧的值 - Replace python pandas df with values of a second dataframe based with condition
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM