[英]Use pandas to select the lagged row along with current row based on criteria
[英]Pandas select row based on value of current row
假设我有 2 个数据框,其中餐厅和类别。 我正在尝试创建一个新列,该列将包含同一区域内且至少有一个共同类别的餐厅数量。
我该如何解决这个问题? 这是我到目前为止所做的
Restaurant contains: id, zone
id Zone ...
0 11 H5X ...
1 12 H2A
2 13 H5X
3 14 H53
4 15 H21 ...
Category contains: id, category
id category ...
0 11 Sushi ...
1 12 Fast Food
2 13 Sandwich
3 13 Sushi
4 14 Noodle
5 14 Fast Food
6 15 Bakeries ...
现在我怎样才能创建一个新的列“交集”到 originalDF 导致这个:
id Zone intersection
0 11 H5X 1 (since there is one restaurant, id=13, that is in the same zone(H5X
and have at least one category in common, Sushi)
1 12 H2A 0
3 13 H5X 1 (since there is one restaurant, id =11, that is in the same zone (h5x) andat
least one category in common , sushi)
5 14 H53 0
6 15 H21 0
任何人都可以帮助我,我迷路了。 谢谢
import pandas as pd
# create both datasets
df1 = pd.DataFrame({
'id': [11, 12, 13, 14, 15],
'zone': ['H5X', 'H2A', 'H5X', 'H53', 'H21']
})
df1.head()
df2 = pd.DataFrame({
'id': [11, 12, 13, 13, 14, 14, 15],
'category': ['Sushi', 'Fast food', 'Sandwich', 'Sushi', 'Noodle', 'Fats food', 'Bakeries']
})
df2.head()
# merge datasets based on restaurant id
df3 = pd.merge(df1, df2, how='left', on=['id'])
df3.reset_index(drop=True, inplace=True)
df3.head()
输出:
# count repeating zone / category
cnt = df3.groupby(['zone', 'category']).size().to_frame('count')
cnt.head(10)
输出:
# merge counts to first dataframe to achieve desired result
df4 = pd.merge(df1, cnt, how='left', on='zone')
df4['count'] = df4['count'].apply(lambda x: 0 if x <=1 else 1)
df4.rename(columns={'count': 'intersection'}, inplace=True)
df4.head()
输出:
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.