I have two lists - query
and line
. My code finds if a query
such as:
["president" ,"publicly"]
Is contained in a line
(order matters) such as:
["president" ,"publicly", "told"]
And this is the code I'm currently using:
if ' '.join(query) in ' '.join(line)
Problem is, I want to match whole words only. So the query below won't pass the condition statement:
["president" ,"pub"]
How can I do that?
You could use regexes and the \\b
word boundaries:
import re
the_regex = re.compile(r'\b' + r'\b'.join(map(re.escape, ['president', 'pub'])) + r'\b')
if the_regex.search(' '.join(line)):
print 'matching'
else:
print 'not matching'
As an alternative you can write a function to check if a given list is a sublist of the line. Something like:
def find_sublist(sub, lst):
if not sub:
return 0
cur_index = 0
while cur_index < len(lst):
try:
cur_index = lst.index(sub[0], cur_index)
except ValueError:
break
if lst[cur_index:cur_index + len(sub)] == sub:
break
lst = lst[cur_index + 1:]
return cur_index
Which you can use as:
if find_sublist(query, line) >= 0:
print 'matching'
else:
print 'not matching'
Just use the "in" operator:
mylist = ['foo', 'bar', 'baz']
'foo' in mylist
-> returns True 'bar' in mylist
-> returns True 'fo' in mylist
-> returns False 'ba' in mylist
-> returns False
这是一种方式:
re.search(r'\b' + re.escape(' '.join(query)) + r'\b', ' '.join(line)) is not None
Just for fun you can also do:
a = ["president" ,"publicly", "told"]
b = ["president" ,"publicly"]
c = ["president" ,"pub"]
d = ["publicly", "president"]
e = ["publicly", "told"]
from itertools import izip
not [l for l,n in izip(a, b) if l != n] ## True
not [l for l,n in izip(a, c) if l != n] ## False
not [l for l,n in izip(a, d) if l != n] ## False
## to support query in the middle of the line:
try:
query_list = a[a.index(e[0]):]
not [l for l,n in izip(query_list, e) if l != n] ## True
expect ValueError:
pass
you can use issubset method to achieve this. Simply do:
a = ["president" ,"publicly"]
b = ["president" ,"publicly", "told"]
if set(a).issubset(b):
#bla bla
this will return matching items in both lists.
You can use the all
built in quantor function:
if all(word in b for word in a):
""" all words in list"""
Note that this may not be run time efficient for long lists. Better use set
type instead of list for a
(list list of words to search in).
Here is a non-regex way of doing it. I'm sure regex would be much faster than this:
>>> query = ['president', 'publicly']
>>> line = ['president', 'publicly', 'told']
>>> any(query == line[i:i+len(query)] for i in range(len(line) - len(query)))
True
>>> query = ["president" ,"pub"]
>>> any(query == line[i:i+len(query)] for i in range(len(line) - len(query)))
False
Explicit is better than implicit. And as ordering matters, I would write it down like this:
query = ['president','publicly']
query_false = ['president','pub']
line = ['president','publicly','told']
query_len = len(query)
blocks = [line[i:i+query_len] for i in xrange(len(line)-query_len+1)]
blocks
holds all relevant combinations to check for:
[['president', 'publicly'], ['publicly', 'told']]
Now you can simply check if your query is in that list:
print query in blocks # -> True
print query_false in blocks # -> False
The code works the way you would probably explain the straight forward solution in words, which is usually a good sign to me. If you have long lines and performance becomes a problem, you can replace the generated list by a generator.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.