[英]prefix matching in python
I have a string like: 我有一个字符串:
" This is such an nice artwork"
and I have a tag_list ["art","paint"]
我有一个tag_list
["art","paint"]
Basically, I want to write a function which accepts this string and taglist as inputs and returns me the word "artwork" as artwork contains the word art which is in taglist. 基本上,我想写一个函数,接受这个字符串和taglist作为输入,并返回单词“artwork”,因为艺术作品包含在taglist中的单词art。
How do i do this most efficiently? 我如何最有效地做到这一点?
I want this to be efficient in terms of speed 我希望这在速度方面是有效的
def prefix_match(string, taglist):
# do something here
return word_in string
Try the following: 请尝试以下方法:
def prefix_match(sentence, taglist):
taglist = tuple(taglist)
for word in sentence.split():
if word.startswith(taglist):
return word
This works because str.startswith()
can accept a tuple of prefixes as an argument. 这是有效的,因为
str.startswith()
可以接受前缀元组作为参数。
Note that I renamed string
to sentence
so there isn't any ambiguity with the string module. 请注意,我将
string
重命名为sentence
因此字符串模块没有任何歧义。
Try this: 尝试这个:
def prefix_match(s, taglist):
words = s.split()
return [w for t in taglist for w in words if w.startswith(t)]
s = "This is such an nice artwork"
taglist = ["art", "paint"]
prefix_match(s, taglist)
The above will return a list with all the words in the string that match a prefix in the list of tags. 上面将返回一个列表,其中包含字符串中与标记列表中的前缀匹配的所有单词。
Here is a possible solution. 这是一个可能的解决方案。 I am using
regex
, because I can get rid of punctuation symbols easily this way. 我正在使用
regex
,因为我可以通过这种方式轻松摆脱标点符号。 Also, I am using collections.Counter
this might add efficiency if your string has a lot of repeated words. 另外,我正在使用
collections.Counter
如果你的字符串有很多重复的单词,这可能会增加效率。
tag_list = ["art","paint"]
s = "This is such an nice artwork, very nice artwork. This is the best painting I've ever seen"
from collections import Counter
import re
words = re.findall(r'(\w+)', s)
dicto = Counter(words)
def found(s, tag):
return s.startswith(tag)
words_found = []
for tag in tag_list:
for k,v in dicto.iteritems():
if found(k, tag):
words_found.append((k,v))
The last part can be done with list comprehension: 最后一部分可以用列表理解来完成:
words_found = [[(k,v) for k,v in dicto.iteritems() if found(k,tag)] for tag in tag_list]
Result: 结果:
>>> words_found
[('artwork', 2), ('painting', 1)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.