[英]How to extract words from a sentence and check if a word is not there in it
[英]in pandas how to extract specific words from a sentence in a column
我有df1,我想從'desc'中的句子中提取'flavor',然后獲取df2。 我有一個口味列表,我根據該列表決定選擇哪個口味。 如何在python中獲得結果?
df1:
desc flavor
Coke 600mL and Chips
Coke Zero 600mL and Chips
390ml Coke + Small Fries
600ml Coke + Regular Fries with
Vanilla Coke 600mL and Chips
Garlic Bread and pepsi 1.25ltr
df2:
desc flavor
Coke 600mL and Chips Coke
Coke Zero 600mL and Chips Coke Zero
390ml Coke + SmallFries Coke
600ml coke + Regular Fries with Coke
Vanilla Coke 600mL and Chips Vanilla Coke
Garlic Bread and pepsi 1.25ltr Pepsi
> Flavor list:
Coke
Coke Zero
Vanilla Coke
Pepsi
如果只想按列表提取一個值,則使用str.extract
:
import re
L = ['Coke Zero', 'Vanilla Coke','Pepsi','Coke']
pat = '|'.join(r"\b{}\b".format(x) for x in L)
df['flavor'] = df['desc'].str.extract('('+ pat + ')', expand=False, flags=re.I)
print (df)
desc flavor
0 Coke 600mL and Chips Coke
1 Coke Zero 600mL and Chips Coke Zero
2 390ml Coke + Small Fries Coke
3 600ml Coke + Regular Fries with Coke
4 Vanilla Coke 600mL and Chips Vanilla Coke
5 Garlic Bread and pepsi 1.25ltr pepsi
如果可能的話多flavours
使用str.findall
的列表,然后str.join
:
df['flavor'] = df['desc'].str.findall(pat, flags=re.I).str.join(' ')
print (df)
desc flavor
0 Coke 600mL and Chips Coke
1 Coke Zero 600mL and Chips Coke Zero
2 390ml Coke + Small Fries Coke
3 600ml Coke + Regular Fries with Coke
4 Vanilla Coke 600mL and Chips Vanilla Coke
5 Garlic Bread and pepsi 1.25ltr pepsi
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.