[英]Lookup for a value in column based on condition
I have two dataframes:我有两个数据框:
df1: df1:
foo
0 2
1 11
2 18
3 6
4 14
5 12
6 8
7 13
8 7
9 5
df2: df2:
bar date
0 2 06-01-2020
1 5 06-01-2020
2 7 06-01-2020
3 8 06-01-2020
4 3 06-01-2020
df1['result'] = df1.foo.isin(df2.bar)
I want to lookup for the date value in df2 if the 'foo' of df1 is present in 'bar' of df2.如果 df1 的 'foo' 出现在 df2 的 'bar' 中,我想在 df2 中查找日期值。 so I tried the following:所以我尝试了以下方法:
df1['date'] = df2['date'].loc[df1.foo.isin(df2.bar)]
But it gives the output for a single column value但它给出了单列值的输出
Output:输出:
foo result date
0 2 True 06-01-2020
1 11 False NaN
2 18 False NaN
3 6 False NaN
4 14 False NaN
5 12 False NaN
6 8 True NaN
7 13 False NaN
8 7 True NaN
9 5 True NaN
If the vale of foo is not in bar then it should have today's date like the following:如果 foo 的值不在 bar 中,那么它应该具有今天的日期,如下所示:
Expected output:预期输出:
foo result date
0 2 True 06-01-2020
1 11 False 24-08-2020
2 18 False 24-08-2020
3 6 False 24-08-2020
4 14 False 24-08-2020
5 12 False 24-08-2020
6 8 True 06-01-2020
7 13 False 24-08-2020
8 7 True 06-01-2020
9 5 True 06-01-2020
Use Series.map
for add datetimes by Series
created by df2
values, last replace missing values actual datetime:使用Series.map
通过由df2
值创建的Series
添加日期时间,最后替换缺失值实际日期时间:
Solution with strings datetimes in format DD-MM-YYYY
:格式为DD-MM-YYYY
字符串日期时间的解决方案:
df1['result'] = df1.foo.isin(df2.bar)
now = pd.Timestamp('now').strftime('%d-%m-%Y')
df1['date'] = df1['foo'].map(df2.set_index('bar')['date']).fillna(now)
print (df1)
foo result date
0 2 True 06-01-2020
1 11 False 24-08-2020
2 18 False 24-08-2020
3 6 False 24-08-2020
4 14 False 24-08-2020
5 12 False 24-08-2020
6 8 True 06-01-2020
7 13 False 24-08-2020
8 7 True 06-01-2020
9 5 True 06-01-2020
If working with datetimes:如果使用日期时间:
now = pd.Timestamp('now').strftime('d')
df1['date'] = df1['foo'].map(df2.set_index('bar')['date']).fillna(now)
Using merge使用合并
# Sample Data
df1 = pd.DataFrame( {'foo': [2,11,18,6,14,12,8,13,7,5]})
df2 = pd.DataFrame({'bar': [2,5,7,8,3],
'date' : [datetime.date(2020, 1, 6)]*5 })
# Merge with left join and filter out required columns
df = df1.merge(df2, how='left', left_on='foo', right_on='bar')[['foo', 'date']]
# populate result based on the missing data
df['result'] = ~result['date'].isnull()
# Finally replace all missing date with the default one you want
df['date'] = df['date'].fillna(datetime.date(2020,8, 24))
print (df)
Output:输出:
foo date result
0 2 2020-01-06 True
1 11 2020-08-24 False
2 18 2020-08-24 False
3 6 2020-08-24 False
4 14 2020-08-24 False
5 12 2020-08-24 False
6 8 2020-01-06 True
7 13 2020-08-24 False
8 7 2020-01-06 True
9 5 2020-01-06 True
You can use merge of dataframe in pandas like this:您可以像这样在 Pandas 中使用数据帧的合并:
import pandas as pd
import numpy as np
from datetime import datetime
df1 = pd.DataFrame({'foo':[2,11,18,6,14,12,8,13,7,5]})
df2 = pd.DataFrame({'bar':[2,5,7,8,3], 'date': ['06-01-2020']*5})
df3 = df1.merge(df2,how='left', left_on='foo', right_on='bar')
df3['result'] = True
df3.loc[df3['bar'].isna(), ['result', 'date']] = [False, datetime.now().strftime('%d-%m-%Y')]
df3.drop('bar', inplace=True, axis=1)
print(df3)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.