簡體   English   中英

Python從字符串中的列表中搜索確切的單詞?

[英]Python search exact word from list in string?

我需要從字符串中的列表中找到確切的單詞。

我試過下面的代碼。 在這里,我從列表中得到單個單詞的完全匹配,但是如何匹配列表中的兩個單詞。

categories_to_retain = 
['SOLID',
 'GEOMETRIC',
 'FLORAL',
 'BOTANICAL',
 'STRIPES',
 'ABSTRACT',
 'ANIMAL',
 'GRAPHIC PRINT',
 'ORIENTAL',
 'DAMASK',
 'TEXT',
 'CHEVRON',
 'PLAID',
 'PAISLEY',
 'SPORTS']

x = " Beautiful Art By  Design Studio **graphic print** Creates A **TEXT** Design For This Art Driven Duvet. Printed In Remarkable Detail On A Woven Duvet, This Is An Instant Focal Point Of Any Bedroom. The Fabric Is Woven Of Easy Care Polyester And Backed With A Soft Poly/Cotton Blend Fabric. The Texture In The Fabric Gives Dimension And A Unique Look And Feel To The Duvet."

x = x.upper()

print x

#x = "GRAPHIC"
#x = "GRAPHIC PRINTS"


matches = [cat for cat in categories_to_retain if cat in x.split()]

matches

Output:
['TEXT']

在這里你可以看到我的列表中有一個名為'GRAPHIC PRINT'的單詞。 我想從我的字符串中找到這個詞。

即使它以復數形式或過去時態存在,我也需要找到單詞。 例如,STRIPED,STRIPE,GRAPHIC PRINTS等。

謝謝,Niranjan

使用帶邊界的正則表達式來獲得完全匹配,即使您只有單個單詞,如果您試圖忽略任何標點符號,您的邏輯將無效:

import re

patts = re.compile("|".join(r"\b{}\b".format(s) for s in categories_to_retain), re.I)

x = " Beautiful Art By  Design Studio **graphic print** Creates A **TEXT** Design For This Art Driven Duvet. Printed In Remarkable Detail On A Woven Duvet, This Is An Instant Focal Point Of Any Bedroom. The Fabric Is Woven Of Easy Care Polyester And Backed With A Soft Poly/Cotton Blend Fabric. The Texture In The Fabric Gives Dimension And A Unique Look And Feel To The Duvet."

print(patts.findall(x))

哪個會給你:

['graphic print', 'TEXT']

您可以使用正則表達式,這也有助於避免匹配字符序列,並將顯示確切的輸入字。

import re
matches = []
categories_to_retain = ['SOLID',
     'GEOMETRIC',
     'FLORAL',
     'BOTANICAL',
     'STRIPES',
     'ABSTRACT',
     'ANIMAL',
     'GRAPHIC PRINT',
     'ORIENTAL',
     'DAMASK',
     'TEXT',
     'CHEVRON',
     'PLAID',
     'PAISLEY',
     'SPORTS']

x = " Beautiful Art By  Design Studio **graphic print** Creates A **TEXT** Design For This Art Driven Duvet. Printed In Remarkable Detail On A Woven Duvet, This Is An Instant Focal Point Of Any Bedroom. The Fabric Is Woven Of Easy Care Polyester And Backed With A Soft Poly/Cotton Blend Fabric. The Texture In The Fabric Gives Dimension And A Unique Look And Feel To The Duvet."

x = x.upper()

print(x)

def searchWholeWord(w):
    return re.compile(r'\b({0})\b'.format(w), flags=re.IGNORECASE).search

for cat in categories_to_retain:
    return_value = searchWholeWord(cat)(x)
    if return_value:
        matches.append(cat)

print(matches)

輸出:

['GRAPHIC PRINT', 'TEXT']

在這里,您使用默認的split()拆分字符串,這意味着它將在每個空格處拆分:x.split()中將有字符串“GRAPHIC”和“PRINT”,但不是“GRAPHIC PRINT”。 你可能想要使用“if cat in x”,我相信在這種情況下我會回復你需要的東西。

這應該工作:

matches = [cat for cat in categories_to_retain if cat in x]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM