簡體   English   中英

最清晰,Python式,可靠且最快的方法來檢查字符串是否包含列表中的單詞

[英]Most clear, Pythonic, reliable, and fastest way to check if a string contains words from a list of lists

我正在尋找最清晰,Pythonic和最快的方法來檢查字符串是否包含列表中的單詞

這是我到目前為止提出的

introStrings = ['introduction:' , 'case:' , 'introduction' , 'case' ]
backgroundStrins = ['literature:' , 'background:',  'Related:' , 'literature' , 'background',  'related' ]
methodStrings = [ 'methods:' , 'method:', 'techniques:', 'methodology:' , 'methods' , 'method', 'techniques', 'methodology' ]
resultStrings = [ 'results:', 'result:', 'experimental:', 'experiments:', 'experiment:', 'results', 'result', 'experimental', 'experiments', 'experiment']
discussioStrings = [ 'discussion:' , 'Limitations:'  , 'discussion' , 'limitations']
conclusionStrings = ['conclusion:' , 'conclusions:', 'concluding:' , 'conclusion' , 'conclusions', 'concluding' ]

allStrings = [ introStrings, backgroundStrins, methodStrings, resultStrings, discussioStrings, conclusionStrings ]

testtt = 'this may thod be in techniques ever material and methods'

for item in allStrings:
    for word in testtt.split():
        if word in item:
            print('yes')
            break

此代碼可以很好地查找所有組合。 這是一個嵌套的for循環。 乍一看還不清楚。

我想知道是否有更好的方法。

any()與鏈式列表理解一起使用會更加Pythonic:

print any(word in sublist for word in testtt.split() for sublist in allStrings)

但是,這只會返回true / false。 它不會識別在哪個子列表中找到哪個詞。 您可以使用此列表理解來打印特定的匹配項:

print [(word,sublist) for word in testtt.split() for sublist in allStrings if word in sublist]

通過testtt.split()計算testtt.split()您的代碼有點浪費。

我正在尋找最清晰,Pythonic和最快的方法來檢查字符串是否包含列表中的單詞

首先,我將列表弄平

all_strings = [*intro, *back, *methods, ...] # You get the idea

(或者,使用嵌套列表理解)

all_strings = [word for list in [intro, back, ...] for word in list] # if you're into that

接下來,分割字符串:

string_words = a_string.split()

最后,只需查找以下單詞:

found = [w for w in string_words if w in all_strings]

那是很Python的,對速度或可靠性不是很確定

我所能得到的是通過使用chainany

resultStrings = [
    "results:",
    "result:",
    "experimental:",
    "experiments:",
    "experiment:",
    "results",
    "result",
    "experimental",
    "experiments",
    "experiment",
]
conclusionStrings = [
    "conclusion:",
    "conclusions:",
    "concluding:",
    "conclusion",
    "conclusions",
    "concluding",
]

allStrings = [resultStrings, conclusionStrings]
testtt = "this may thod be in techniques ever material and methods"

from itertools import chain
string_set = set(chain(*allStrings))
any(i in string_set for i in testtt.split())

雖然set需要一些空間,它可以提高工作效率。 謝謝彼得·伍德。

使用itertools

import itertools
merged = list(itertools.chain.from_iterable(allStrings))
[print(x) for x in testtt.split() if x in merged]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM