简体   繁体   English

从列表中的一个字符串中搜索任何单词或单词组合(python)

[英]Search for any word or combination of words from one string in a list (python)

I have a string (for example: "alpha beta charlie, delta&epsilon foxtrot" ) and a list (for example ["zero","omega virginia","apple beta charlie"] ). 我有一个字符串(例如: "alpha beta charlie, delta&epsilon foxtrot" )和一个列表(例如["zero","omega virginia","apple beta charlie"] )。 Is there a convenient way to iterate through every word and combination of words in the string in order to search for it in the list? 是否有方便的方法来遍历字符串中的每个单词和单词组合以在列表中进行搜索?

Purpose 目的

You're saying combinations, but combinations are semantically unordered, what you mean, is you intend to find the intersection of all ordered permutations joined by spaces with a target list. 您说的是组合,但是组合在语义上是无序的,这意味着要查找由空格与目标列表连接的所有有序排列的交集。

To begin with, we need to import the libraries we intend to use. 首先,我们需要导入要使用的库。

import re
import itertools

Splitting the string 分割字符串

Don't split on characters, you're doing a semantic search for words exclusive of strange characters. 不要对字符进行区分,您正在对不包含奇怪字符的单词进行语义搜索。 Regular expressions, powered by the re module are perfect for this. re模块提供支持的正则表达式非常适合此操作。 In a raw Python string, r'' , we use the regular expression for the edge of a word, \\b , around any alphanumeric character (and _ ), \\w , of number greater than or equal to one, + . 在原始的Python字符串r'' ,我们使用正则表达式来表示单词\\b的边缘,在任何大于或等于+字母数字字符(和_\\w周围。

re.findall returns a list of every match. re.findall返回每个匹配项的列表。

re_pattern = r'\b\w+\b'
silly_string = 'alpha beta charlie, delta&epsilon foxtrot'
words = re.findall(re_pattern, silly_string)

Here, words is our wordlist: 在这里,单词是我们的单词列表:

>>> print words
['alpha', 'beta', 'charlie', 'delta', 'epsilon', 'foxtrot']

Creating the Permutations 创建排列

Continuing, we prefer to manipulate our data with generators to avoid unnecessarily materializing data before we need it and holding large datasets in memory. 继续,我们更喜欢使用生成器来处理数据,以避免在需要数据之前不必要地物化数据并将大型数据集保存在内存中。 The itertools library has some nice functions that neatly suit our needs for providing all permutations of the above words and chaining them in a single iterable: itertools库具有一些不错的功能,这些功能完全可以满足我们提供上述单词的所有排列并将它们链接到单个可迭代对象中的需求:

_gen = (itertools.permutations(words, i + 1) for i in xrange(len(words)))
all_permutations_gen = itertools.chain(*_gen)

listing all_permutations_gen with list(all_permutations_gen) would give us: list(all_permutations_gen)会给我们:

[('alpha',), ('beta',), ('charlie',), ('delta',), ('epsilon',), ('foxtrot',), ('alpha', 'beta'), ('alpha', 'charlie'), ('alpha', 'delta'), ('alpha', 'epsilon'), ('alpha', 'foxtrot'), ('beta', 'alpha'), ('beta', 'charlie'), ('beta', 'delta'), ('beta', 'epsilon'), ('beta', 'foxtrot'), ('charlie', 'alpha'), ('charlie', 'beta'), ('charlie', 'delta'), ('charlie', 'epsilon'), ('charlie', 'foxtrot'), ('delta', 'alpha'), ('delta', 'beta'), ('delta', 'charlie'), ('delta', 'epsilon'), ('delta', 'foxtrot'), ('epsilon', 'alpha'), ('epsilon', 'beta'), ('epsilon', 'charlie'), ('epsilon', 'delta'), ('epsilon', 'foxtrot'), ('foxtrot', 'alpha'), ('foxtrot', 'beta'), ('foxtrot', 'charlie'), ('foxtrot', 'delta'), ('foxtrot', 'epsilon'), ('alpha', 'beta', 'charlie'), ('alpha', 'beta', 'delta'), ... [('alpha',),('beta',),('charlie',),('delta',),('epsilon',),('foxtrot',),('alpha','beta '),('alpha','charlie'),('alpha','delta'),('alpha','epsilon'),('alpha','foxtrot'),('beta','alpha '),('beta','charlie'),('beta','delta'),('beta','epsilon'),('beta','foxtrot'),('charlie','alpha '),('charlie','beta'),('charlie','delta'),('charlie','epsilon'),('charlie','foxtrot'),('delta','alpha '),('delta','beta'),('delta','charlie'),('delta','epsilon'),('delta','foxtrot'),('epsilon','alpha '),('epsilon','beta'),('epsilon','charlie'),('epsilon','delta'),('epsilon','foxtrot'),('foxtrot','alpha '),('foxtrot','beta'),('foxtrot','charlie'),('foxtrot','delta'),('foxtrot','epsilon'),('alpha','beta ','charlie'),('alpha','beta','delta'),...

If we materialized the generator in a list instead of a set, printing the first 20 items would show us: 如果我们将生成器具体化为列表而不是列表,那么打印前20个项目将向我们显示:

>>> print all_permutations[:20] # this only works if you cast as a list instead
['alpha', 'beta', 'charlie', 'delta', 'epsilon', 'foxtrot', 'alpha beta', 'alpha charlie', 'alpha delta', 'alpha epsilon', 'alpha foxtrot', 'beta alpha', 'beta charlie', 'beta delta', 'beta epsilon', 'beta foxtrot', 'charlie alpha', 'charlie beta', 'charlie delta', 'charlie epsilon']

But that would exhaust the generator before we're ready. 但这会在我们准备好之前耗尽发电机的能量。 So instead, now we get the set of all permutations of those words 因此,现在我们得到这些单词的所有排列的集合

all_permutations = set(' '.join(i) for i in all_permutations_gen)

Checking for Membership of any Permutations in Target List 检查目标列表中任何排列的成员资格

So we see with this we can now search for an intersection with the target list: 这样,我们现在可以搜索与目标列表的交集:

>>> target_list = ["zero","omega virginia","apple beta charlie"]
>>> all_permutations.intersection(target_list)
set([])

And in this case, for the examples given, we get the empty set, but if we have a string in the target that's in our set of permutations: 在这种情况下,对于给定的示例,我们得到了一个空集,但是如果我们在排列集中的目标中有一个字符串,则:

>>> target_list_2 = ["apple beta charlie", "foxtrot alpha beta charlie"]
>>> all_permutations.intersection(target_list_2)
set(['foxtrot alpha beta charlie'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python re:如果字符串有一个单词和任何一个单词列表? - Python re: if string has one word AND any one of a list of words? 如何从 python 中的列表中提取字符串的单词组合 - How to extract words combination of a string from a list in python Python从字符串中的列表中搜索确切的单词? - Python search exact word from list in string? 将单词列表中的所有单词替换为python中的另一个单词 - Replace all words from word list with another string in python 蟒蛇。 使用单词列表中的任何单词分割字符串 - Python. Split string using any word from a list of word 搜索文件中包含特定单词/字符串组合的每个行列表 - Search each line list in a file that contain combination of specific words/string 如何让python在列表中搜索一个单词而不是列表中所有单词的文本? - How do I get python to search text for one word in a list rather than all the words in a list? 从行列表中搜索单词(从单词列表中)并将值附加到新列表中。 蟒蛇 - Search for word (from list of words) in line (from list of lines) and append values to new list. Python Python-计算列表中字符串的单词频率,列表中单词的数量不同 - Python - count word frequency of string from list, number of words from list varies 使用单词列表在输入的字符串中搜索这些单词之一 - Using a list of words to search for one of those words in an inputted string
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM