比较来自相同 pandas dataframe 的 2 列的值和基于比较的第 3 列的返回值

Question

I'm trying to compare values between 2 columns in the same pandas dataframe and for where ever the match has been found I want to return the values from that row but from a 3rd column.我正在尝试比较同一 pandas dataframe 中的两列之间的值，并且对于找到匹配项的地方，我想从该行返回值，但从第三列返回。

Basically if the following is dataframe df基本上如果以下是 dataframe df

| date      | date_new   | category | value  |
| --------- | ---------- | -------- | ------ |
|2016-05-11 | 2018-05-15 | day      | 1000.0 |
|2020-03-28 | 2018-05-11 | night    | 2220.1 |
|2018-05-15 | 2020-03-28 | day      | 142.8  |
|2018-05-11 | 2019-01-29 | night    | 1832.9 |

I want to add a new column say, value_new which is basically obtained by getting the values from value after comparing for every date value in date_new for every date value in date followed by comparing if both the rows have same category values.我想添加一个新列，例如value_new ，它基本上是通过在比较date_new中的每个日期值和date中的每个日期值之后从value中获取值，然后比较两行是否具有相同的category值。

[steps of transformation] 【改造步骤】
- 1. for each value in date_new look for a match in date - 1. 对于date_new中的每个值，在date中查找匹配项
- 2. if match found, compare if values in category column also match - 2. 如果找到匹配，比较category列中的值是否也匹配
- 3. if both the matches in above steps fulfilled, pick the corresponding value from value column from the row where both the matches fulfilled, otherwise leave blank. - 3. 如果上述步骤中的两个匹配项都满足，则从两个匹配项都满足的行中的value列中选择相应的值，否则留空。

So, I would finally want the final dataframe to look something like this.所以，我最终希望最终的 dataframe 看起来像这样。

| date      | date_new   | category | value  | value_new |
| --------- | ---------- | -------- | ------ | --------- |
|2016-05-11 | 2018-05-15 | day      | 1000.0 | 142.8     |
|2020-03-28 | 2018-05-11 | night    | 2220.1 | 1832.9    |
|2018-05-15 | 2020-03-28 | day      | 142.8  | None      |
|2018-05-11 | 2016-05-11 | day      | 1832.9 | 1000.0    |

Answer 1

Use DataFrame.merge with left join and assigned new column:使用DataFrame.merge与左连接并分配新列：

df['value_new'] = df.merge(df, 
                           left_on=['date_new','category'], 
                           right_on=['date','category'], how='left')['value_y']
print (df)

         date    date_new category   value  value_new
0  2016-05-11  2018-05-15      day  1000.0      142.8
1  2020-03-28  2018-05-11    night  2220.1        NaN
2  2018-05-15  2020-03-28      day   142.8        NaN
3  2018-05-11  2016-05-11      day  1832.9     1000.0

比较来自相同 pandas dataframe 的 2 列的值和基于比较的第 3 列的返回值

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-06-04 09:29:17

比较来自相同 pandas dataframe 的 2 列的值和基于比较的第 3 列的返回值

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-06-04 09:29:17

解决方案1
2 已采纳 2020-06-04 09:29:17