[英]Extract space separated words from a sentence in Python
我有字符串列表說, x1 = ['esk','wild man','eskimo', 'sta','(+)-6-[amina(4-chlora)(1-metha-1h-imidol-5-yl)mhyl]-4-(3-chlora)-1-methyl-2(1h)-quinoa']
我需要在幾句話中提取 x1s。
我的句子是"eskimo lives as a wild man in wild jungle and he stands as a guard".
在句子中,我需要提取第一個單詞 eskimo 和第七個和第八個單詞 wild man,它們是單獨的單詞,如 x1。 即使 sta 出現在看台中,我也不應該提取“看台”。
def get_name(input_str):
prod_name= []
for row in x1:
if (row.strip().lower()in input_str.lower().strip()) or (len([x for x in input_str.split() if "\b"+x in row])>0):
prod_name.append(row)
return list(set(prod_name))
函數get_name("eskimo lives as a wild man in wild jungle and he stands as a guard")
返回
[esk, eskimo,wild man,sta]
但預期是
[eskimo,wild man]
我可以知道代碼中需要更改的內容嗎?
您可以簡單地使用 str.split(" ") 獲取句子中所有單詞的列表,然后執行以下操作:
s = "eskimo lives as a wild man in wild jungle and he stands as a guard"
l = s.split(" ")
x1 = ['esk','wild man','eskimo', 'sta','(+)-6-[amina(4-chlora)(1-metha-1h-imidol-5-yl)mhyl]-4-(3-chlora)-1-methyl-2(1h)-quinoa']
new_x1 = [word.split(" ") for word in x1 if " " in word] + [word for word in x1 if " " not in word]
ans = []
for x in new_x1:
if type(x) == str:
if x in l:
ans.append(x)
else:
temp = ""
for i in x:
temp += i + " "
temp = temp[:-1]
if all(sub_x in l for sub_x in x) and temp in s:
ans.append(temp)
print(ans)
我有一個稍微不同的方法。 首先,您可以將輸入句子拆分為單詞,並將要檢查的每個短語拆分為組成單詞。 然后檢查句子中是否存在一個短語的所有單詞。
x1 = ['esk','wild man','eskimo', 'sta','(+)-6-[amina(4-chlora)(1-metha-1h-imidol-5-yl)mhyl]-4-(3-chlora)-1-methyl-2(1h)-quinoa']
input_sentence = "eskimo lives as a wild man in wild jungle and he stands as a guard"
# Remove all punctuation marks from the sentence
input_sentence = input_sentence.replace('!', '').replace('.', '').replace('?', '').replace(',', '')
# Split the input sentence into its component words to check individually
input_words = input_sentence.split()
for ele in x1:
# Split each element in x1 into words
ele_words = ele.split()
# Check if all words are part of the input words
if all(ele in input_words for ele in ele_words) and ele in input_sentence:
print(ele)
您可以使用正則表達式
import re
x1 = ['esk','wild man','eskimo', 'sta']
my_str = "eskimo lives as a wild man in wild jungle and he stands as a guard"
my_list = []
for words in x1:
if re.search(r'\b' + words + r'\b', my_str):
my_list.append(words)
print(my_list)
根據新列表,因為字符串(+)-6-[amina(4-chlora)(1-metha-1h-imidol-5-yl)mhyl]-4-(3-chlora)-1-methyl-2(1h)-quinoa
使用正則表達式生成錯誤,您可以使用try
except
塊
for words in x1:
try:
if re.search(r'\b' + words + r'\b', my_str):
my_list.append(words)
except:
pass
您可以在左側(?<!\S)
和右側(?!\S)
使用帶有空格邊界的正則表達式來不獲得部分匹配,並加入x1
列表中的所有項目。
然后使用 re.findall 獲取所有匹配項:
import re
x1 = ['esk','wild man','eskimo', 'sta','(+)-6-[amina(4-chlora)(1-metha-1h-imidol-5-yl)mhyl]-4-(3-chlora)-1-methyl-2(1h)-quinoa']
s = "eskimo lives as a wild man in wild jungle and he stands as a guard"
pattern = fr"(?<!\S)(?:{'|'.join(re.escape(x) for x in x1)})(?!\S)"
print(re.findall(pattern, s))
輸出
['eskimo', 'wild man']
查看Python 演示。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.