簡體   English   中英

在python中刪除子字符串時識別字符串

[英]Identify strings while removing substrings in python

我有一個單詞字典,其頻率如下。

mydictionary = {'yummy tim tam':3, 'milk':2, 'chocolates':5, 'biscuit pudding':3, 'sugar':2}

我有一組字符串(刪除標點符號)如下。

recipes_book = "For todays lesson we will show you how to make biscuit pudding using 
yummy tim tam milk and rawsugar"

在上面的字符串中,我只需要通過參考字典輸出“餅干布丁”、“美味的蒂姆”和“牛奶”。 不是糖,因為它是串中的粗糖。

但是,我目前使用的代碼也輸出糖。

mydictionary = {'yummy tim tam':3, 'milk':2, 'chocolates':5, 'biscuit pudding':3, 'sugar':2}
recipes_book = "For today's lesson we will show you how to make biscuit pudding using yummy tim tam milk and rawsugar"
searcher = re.compile(r'{}'.format("|".join(mydictionary.keys())), flags=re.I | re.S)

for match in searcher.findall(recipes_book):
    print(match)

如何避免使用這樣的子字符串,而只考慮一個完整的標記,例如“牛奶”。 請幫我。

使用字邊界'\\b'。 簡單來說

recipes_book = "For todays lesson we will show you how to make biscuit pudding using 
yummy tim tam milk and rawsugar"

>>> re.findall(r'(?is)(\bchocolates\b|\bbiscuit pudding\b|\bsugar\b|\byummy tim tam\b|\bmilk\b)',recipes_book)
['biscuit pudding', 'yummy tim tam', 'milk']

您可以使用正則表達式字邊界更新您的代碼:

mydictionary = {'yummy tim tam':3, 'milk':2, 'chocolates':5, 'biscuit pudding':3, 'sugar':2}
recipes_book = "For today's lesson we will show you how to make biscuit pudding using yummy tim tam milk and rawsugar"
searcher = re.compile(r'{}'.format("|".join(map(lambda x: r'\b{}\b'.format(x), mydictionary.keys()))), flags=re.I | re.S)

for match in searcher.findall(recipes_book):
    print(match)

輸出:

biscuit pudding
yummy tim tam
milk

使用re.escape一種方法。 更多關於re.escape 的信息在這里!!!

import re

mydictionary = {'yummy tim tam':3, 'milk':2, 'chocolates':5, 'biscuit pudding':3, 'sugar':2}
recipes_book = "For today's lesson we will show you how to make biscuit pudding using yummy tim tam milk and rawsugar"

val_list = []

for i in mydictionary.keys():
    tmp_list = []
    regex_tmp = r'\b'+re.escape(str(i))+r'\b'
    tmp_list = re.findall(regex_tmp,recipes_book)
    val_list.extend(tmp_list)

print val_list

輸出:

"C:\Program Files (x86)\Python27\python.exe" C:/Users/punddin/PycharmProjects/demo/demo.py
['yummy tim tam', 'biscuit pudding', 'milk']

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM