[英]comparing values of 2 columns from same pandas dataframe & returning value of 3rd column based on comparison
I'm trying to compare values between 2 columns in the same pandas dataframe and for where ever the match has been found I want to return the values from that row but from a 3rd column.我正在尝试比较同一 pandas dataframe 中的两列之间的值,并且对于找到匹配项的地方,我想从该行返回值,但从第三列返回。
Basically if the following is dataframe df
基本上如果以下是 dataframe df
| date | date_new | category | value |
| --------- | ---------- | -------- | ------ |
|2016-05-11 | 2018-05-15 | day | 1000.0 |
|2020-03-28 | 2018-05-11 | night | 2220.1 |
|2018-05-15 | 2020-03-28 | day | 142.8 |
|2018-05-11 | 2019-01-29 | night | 1832.9 |
I want to add a new column say, value_new
which is basically obtained by getting the values from value
after comparing for every date value in date_new
for every date value in date
followed by comparing if both the rows have same category
values.我想添加一个新列,例如value_new
,它基本上是通过在比较date_new
中的每个日期值和date
中的每个日期值之后从value
中获取值,然后比较两行是否具有相同的category
值。
[steps of transformation] 【改造步骤】
- 1. for each value in date_new
look for a match in date
- 1. 对于date_new
中的每个值,在date
中查找匹配项
- 2. if match found, compare if values in category
column also match - 2. 如果找到匹配,比较category
列中的值是否也匹配
- 3. if both the matches in above steps fulfilled, pick the corresponding value from value
column from the row where both the matches fulfilled, otherwise leave blank. - 3. 如果上述步骤中的两个匹配项都满足,则从两个匹配项都满足的行中的value
列中选择相应的值,否则留空。
So, I would finally want the final dataframe to look something like this.所以,我最终希望最终的 dataframe 看起来像这样。
| date | date_new | category | value | value_new |
| --------- | ---------- | -------- | ------ | --------- |
|2016-05-11 | 2018-05-15 | day | 1000.0 | 142.8 |
|2020-03-28 | 2018-05-11 | night | 2220.1 | 1832.9 |
|2018-05-15 | 2020-03-28 | day | 142.8 | None |
|2018-05-11 | 2016-05-11 | day | 1832.9 | 1000.0 |
Use DataFrame.merge
with left join and assigned new column:使用DataFrame.merge
与左连接并分配新列:
df['value_new'] = df.merge(df,
left_on=['date_new','category'],
right_on=['date','category'], how='left')['value_y']
print (df)
date date_new category value value_new
0 2016-05-11 2018-05-15 day 1000.0 142.8
1 2020-03-28 2018-05-11 night 2220.1 NaN
2 2018-05-15 2020-03-28 day 142.8 NaN
3 2018-05-11 2016-05-11 day 1832.9 1000.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.