[英]How to assign labels according to a list in python pandas.Dataframe?
I have two DataFrame, one is 'recipe', the combination of the ingredients, the other is 'like', which contains the popular combinations. 我有两个DataFrame,一个是“ recipe”,是成分的组合,另一个是“ like”,其中包含流行的组合。
recipe = pd.DataFrame({'A': ['chicken','beef','pork','egg', 'chicken', 'egg', 'beef'],
'B': ['sweet', 'hot', 'salty', 'hot', 'sweet', 'salty', 'hot']})
recipe
A B
0 chicken sweet
1 beef hot
2 pork salty
3 egg hot
4 chicken sweet
5 egg salty
6 beef hot
like = pd.DataFrame({'A':['beef', 'egg'], 'B':['hot', 'salty']})
like
A B
0 beef hot
1 egg salty
How can I add a column 'C' to recipe, if the combination listed in 'like', then I give it value 'yes', otherwise 'no'? 我如何在配方中添加列“ C”,如果组合列为“喜欢”,则给它赋予“是”,否则为“否”?
The result I want is 我想要的结果是
recipe
A B C
0 chicken sweet no
1 beef hot yes
2 pork salty no
3 egg hot no
4 chicken sweet no
5 egg salty yes
6 beef hot yes
The problem is my both dataframes are large. 问题是我两个数据框都很大。 I can not manually choose the items in 'like' and assign the 'yes' label in 'recipe'.
我无法手动选择“喜欢”中的项目并在“食谱”中指定“是”标签。 Are there any easy ways to do that?
有没有简单的方法可以做到这一点?
You can use merge
and numpy.where
: 您可以使用
merge
和numpy.where
:
df = pd.merge(recipe, like, on=['A','B'], indicator=True, how='left')
print df
A B _merge
0 chicken sweet left_only
1 beef hot both
2 pork salty left_only
3 egg hot left_only
4 chicken sweet left_only
5 egg salty both
6 beef hot both
df['C'] = np.where(df['_merge'] == 'both', 'yes', 'no')
print df[['A','B','C']]
A B C
0 chicken sweet no
1 beef hot yes
2 pork salty no
3 egg hot no
4 chicken sweet no
5 egg salty yes
6 beef hot yes
Faster is use df['_merge'] == 'both'
: 使用
df['_merge'] == 'both'
更快:
In [460]: %timeit np.where(np.in1d(df['_merge'],'both'), 'yes', 'no')
100 loops, best of 3: 2.22 ms per loop
In [461]: %timeit np.where(df['_merge'] == 'both', 'yes', 'no')
1000 loops, best of 3: 652 µs per loop
You could add a C
column of 'yes'
s to like
and then merge recipe
with like
. 您可以在
C
添加'yes'
的C
列至like
,然后将recipe
与like
合并。 The rows that match will have yes
in the C
column, the rows without a match will have NaN
s. 匹配的行在
C
列中为yes
,不匹配的行将为NaN
。 You could then use fillna
to replace the NaNs with 'no'
s: 然后,您可以使用
fillna
将NaN替换为'no'
:
import pandas as pd
recipe = pd.DataFrame({'A': ['chicken','beef','pork','egg', 'chicken', 'egg', 'beef'],
'B': ['sweet', 'hot', 'salty', 'hot', 'sweet', 'salty', 'hot']})
like = pd.DataFrame({'A':['beef', 'egg'], 'B':['hot', 'salty']})
like['C'] = 'yes'
result = pd.merge(recipe, like, how='left').fillna('no')
print(result)
yields 产量
A B C
0 chicken sweet no
1 beef hot yes
2 pork salty no
3 egg hot no
4 chicken sweet no
5 egg salty yes
6 beef hot yes
You can use set_value
by matching both A
and B
as such: 您可以通过同时匹配
A
和B
来使用set_value
:
recipe.set_value(recipe[recipe.A.isin(like.A) & recipe.B.isin(like.B)].index,'C','yes')
recipe.fillna('no')
Which will give you: 这会给你:
A B C
0 chicken sweet no
1 beef hot yes
2 pork salty no
3 egg hot yes
4 chicken sweet no
5 egg salty yes
6 beef hot yes
Note: These results do not mean my answer is better than other ones or vice versa. 注意:这些结果并不意味着我的回答比其他答案要好,反之亦然。
Using set_value
: 使用
set_value
:
%timeit recipe.set_value(recipe[recipe.A.isin(like.A) & recipe.B.isin(like.B)].index,'C','yes'); recipe.fillna('no')
100 loops, best of 3: 2.69 ms per loop
Using merge
and creating new df
: 使用
merge
并创建新的df
:
%timeit df = pd.merge(recipe, like, on=['A','B'], indicator=True, how='left'); df['C'] = np.where(df['_merge'] == 'both', 'yes', 'no')
100 loops, best of 3: 8.42 ms per loop
Using merge
only: 仅使用
merge
:
%timeit df['C'] = np.where(df['_merge'] == 'both', 'yes', 'no')
1000 loops, best of 3: 187 µs per loop
Again, it really depends on what you're timing. 同样,这实际上取决于您的时间安排。 Just be cautious of duplicating your data.
只是要小心复制数据。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.