如何根据python pandas.Dataframe中的列表分配标签？

Question

I have two DataFrame, one is 'recipe', the combination of the ingredients, the other is 'like', which contains the popular combinations. 我有两个DataFrame，一个是“ recipe”，是成分的组合，另一个是“ like”，其中包含流行的组合。

recipe = pd.DataFrame({'A': ['chicken','beef','pork','egg', 'chicken', 'egg', 'beef'],
                       'B': ['sweet', 'hot', 'salty', 'hot', 'sweet', 'salty', 'hot']})
recipe
     A      B
0  chicken  sweet
1     beef    hot
2     pork  salty
3      egg    hot
4  chicken  sweet
5      egg  salty
6     beef    hot 

like = pd.DataFrame({'A':['beef', 'egg'], 'B':['hot', 'salty']})
like
    A      B
0  beef    hot
1   egg  salty

How can I add a column 'C' to recipe, if the combination listed in 'like', then I give it value 'yes', otherwise 'no'? 我如何在配方中添加列“ C”，如果组合列为“喜欢”，则给它赋予“是”，否则为“否”？

The result I want is 我想要的结果是

recipe
         A      B    C
0  chicken  sweet   no
1     beef    hot  yes
2     pork  salty   no
3      egg    hot   no
4  chicken  sweet   no
5      egg  salty  yes
6     beef    hot  yes

The problem is my both dataframes are large. 问题是我两个数据框都很大。 I can not manually choose the items in 'like' and assign the 'yes' label in 'recipe'. 我无法手动选择“喜欢”中的项目并在“食谱”中指定“是”标签。 Are there any easy ways to do that? 有没有简单的方法可以做到这一点？

Answer 1

You can use merge and numpy.where : 您可以使用merge和numpy.where ：

df = pd.merge(recipe, like, on=['A','B'], indicator=True, how='left')
print df
         A      B     _merge
0  chicken  sweet  left_only
1     beef    hot       both
2     pork  salty  left_only
3      egg    hot  left_only
4  chicken  sweet  left_only
5      egg  salty       both
6     beef    hot       both

df['C'] = np.where(df['_merge'] == 'both', 'yes', 'no')

print df[['A','B','C']]
         A      B    C
0  chicken  sweet   no
1     beef    hot  yes
2     pork  salty   no
3      egg    hot   no
4  chicken  sweet   no
5      egg  salty  yes
6     beef    hot  yes

Faster is use df['_merge'] == 'both' : 使用df['_merge'] == 'both'更快：

In [460]: %timeit np.where(np.in1d(df['_merge'],'both'), 'yes', 'no')
100 loops, best of 3: 2.22 ms per loop

In [461]: %timeit np.where(df['_merge'] == 'both', 'yes', 'no')
1000 loops, best of 3: 652 µs per loop

Answer 2

You could add a C column of 'yes' s to like and then merge recipe with like . 您可以在C添加'yes'的C列至like ，然后将recipe与like合并。 The rows that match will have yes in the C column, the rows without a match will have NaN s. 匹配的行在C列中为yes ，不匹配的行将为NaN 。 You could then use fillna to replace the NaNs with 'no' s: 然后，您可以使用fillna将NaN替换为'no' ：

import pandas as pd
recipe = pd.DataFrame({'A': ['chicken','beef','pork','egg', 'chicken', 'egg', 'beef'],
                       'B': ['sweet', 'hot', 'salty', 'hot', 'sweet', 'salty', 'hot']})

like = pd.DataFrame({'A':['beef', 'egg'], 'B':['hot', 'salty']})
like['C'] = 'yes'
result = pd.merge(recipe, like, how='left').fillna('no')
print(result)

yields 产量

         A      B    C
0  chicken  sweet   no
1     beef    hot  yes
2     pork  salty   no
3      egg    hot   no
4  chicken  sweet   no
5      egg  salty  yes
6     beef    hot  yes

Answer 3

You can use set_value by matching both A and B as such: 您可以通过同时匹配A和B来使用set_value ：

recipe.set_value(recipe[recipe.A.isin(like.A) & recipe.B.isin(like.B)].index,'C','yes')
recipe.fillna('no')

Which will give you: 这会给你：

         A      B    C
0  chicken  sweet   no
1     beef    hot  yes
2     pork  salty   no
3      egg    hot  yes
4  chicken  sweet   no
5      egg  salty  yes
6     beef    hot  yes

Note: These results do not mean my answer is better than other ones or vice versa. 注意：这些结果并不意味着我的回答比其他答案要好，反之亦然。

Using set_value : 使用set_value ：

%timeit recipe.set_value(recipe[recipe.A.isin(like.A) & recipe.B.isin(like.B)].index,'C','yes'); recipe.fillna('no')
100 loops, best of 3: 2.69 ms per loop

Using merge and creating new df : 使用merge并创建新的df ：

%timeit df = pd.merge(recipe, like, on=['A','B'], indicator=True, how='left'); df['C'] = np.where(df['_merge'] == 'both', 'yes', 'no')
100 loops, best of 3: 8.42 ms per loop

Using merge only: 仅使用merge ：

%timeit df['C'] = np.where(df['_merge'] == 'both', 'yes', 'no')
1000 loops, best of 3: 187 µs per loop

Again, it really depends on what you're timing. 同样，这实际上取决于您的时间安排。 Just be cautious of duplicating your data. 只是要小心复制数据。

如何根据python pandas.Dataframe中的列表分配标签？

问题描述

3 个解决方案

解决方案1
2 已采纳 2016-03-24 12:05:49

解决方案2
1 2016-03-24 12:23:30

解决方案3
1 2016-03-24 12:26:06

如何根据python pandas.Dataframe中的列表分配标签？

问题描述

3 个解决方案

解决方案1 2 已采纳 2016-03-24 12:05:49

解决方案2 1 2016-03-24 12:23:30

解决方案3 1 2016-03-24 12:26:06

解决方案1
2 已采纳 2016-03-24 12:05:49

解决方案2
1 2016-03-24 12:23:30

解决方案3
1 2016-03-24 12:26:06