熊猫根据当前行的值选择行

Question

假设我有 2 个数据框，其中餐厅和类别。 我正在尝试创建一个新列，该列将包含同一区域内且至少有一个共同类别的餐厅数量。

我该如何解决这个问题？ 这是我到目前为止所做的

Restaurant contains: id, zone
      id    Zone   ... 
0     11    H5X    ...
1     12    H2A  
2     13    H5X
3     14    H53
4     15    H21    ... 



Category contains: id, category
      id    category    ...
0     11    Sushi       ...
1     12    Fast Food
2     13    Sandwich
3     13    Sushi
4     14    Noodle
5     14    Fast Food
6     15    Bakeries    ...

现在我怎样才能创建一个新的列“交集”到 originalDF 导致这个：

     id    Zone   intersection
0     11    H5X    1 (since there is one restaurant, id=13, that is in the same zone(H5X 
                                        and have at least one category in common, Sushi)
1     12    H2A    0
3     13    H5X    1 (since there is one restaurant, id =11, that is in the same zone (h5x) andat 
                       least one category in common , sushi)
5     14    H53    0
6     15    H21    0

任何人都可以帮助我，我迷路了。 谢谢

Answer 1

import pandas as pd 

# create both datasets
df1 = pd.DataFrame({
    'id': [11, 12, 13, 14, 15],
    'zone': ['H5X', 'H2A', 'H5X', 'H53', 'H21']
})
df1.head()

df2 = pd.DataFrame({
    'id': [11, 12, 13, 13, 14, 14, 15],
    'category': ['Sushi', 'Fast food', 'Sandwich', 'Sushi', 'Noodle', 'Fats food', 'Bakeries']
})
df2.head()

# merge datasets based on restaurant id
df3 = pd.merge(df1, df2, how='left', on=['id'])
df3.reset_index(drop=True, inplace=True)
df3.head()

输出：

# count repeating zone / category
cnt = df3.groupby(['zone', 'category']).size().to_frame('count')
cnt.head(10)

输出：

# merge counts to first dataframe to achieve desired result
df4 = pd.merge(df1, cnt, how='left', on='zone')
df4['count'] = df4['count'].apply(lambda x: 0 if x <=1 else 1)
df4.rename(columns={'count': 'intersection'}, inplace=True)
df4.head()

输出：

熊猫根据当前行的值选择行

问题描述

1 个解决方案

解决方案1
2 2020-09-27 20:51:54

熊猫根据当前行的值选择行

问题描述

1 个解决方案

解决方案1 2 2020-09-27 20:51:54

解决方案1
2 2020-09-27 20:51:54