简体   繁体   中英

Check if string contains list item

I have the following script to check if a string contains a list item:

word = ['one',
        'two',
        'three']
string = 'my favorite number is two'
if any(word_item in string.split() for word_item in word):
    print 'string contains a word from the word list: %s' % (word_item)

This works, but I'm trying to print the list item(s) that the string contains. What am I doing wrong?

The problem is that you're using an if statement instead of a for statement, so your print only runs (at most) once (if at least one word matches), and at that point, any has run through the whole loop.

This is the easiest way to do what you want:

words = ['one',
         'two',
         'three']
string = 'my favorite number is two'
for word in words:
    if word in string.split():
        print('string contains a word from the word list: %s' % (word))

If you want this to be functional for some reason, you could do it like this:

for word in filter(string.split().__contains__, words):
    print('string contains a word from the word list: %s' % (word))

Since someone is bound to answer with a performance-related answer even though this question has nothing to do with performance, it would be more efficient to split the string once, and depending on how many words you want to check, converting it to a set might also be useful.


Regarding your question in the comments, if you want multi-word "words", there are two easy options: adding whitespace and then searching for the words in the full string, or regular expressions with word boundaries.

The simplest way is to add a space character before and after the text to search and then search for ' ' + word + ' ' :

phrases = ['one',
           'two',
           'two words']
text = "this has two words in it"

for phrase in phrases:
    if " %s " % phrase in text:
        print("text '%s' contains phrase '%s'" % (text, phrase))

For regular expressions, just use the \\b word boundary:

import re

for phrase in phrases:
    if re.search(r"\b%s\b" % re.escape(phrase), text):
        print("text '%s' contains phrase '%s'" % (text, phrase))

Which one is "nicer" is hard to say, but the regular expression is probably significantly less efficient (if that matters to you).


And if you don't care about word boundaries, you can just do:

phrases = ['one',
           'two',
           'two words']
text = "the word 'tone' will be matched, but so will 'two words'"

for phrase in phrases:
    if phrase in text:
        print("text '%s' contains phrase '%s'" % (text, phrase))
set(word).intersection(string.split())

If you has a word like 'ninety five' you could split that word and check all words intersect with a set of the words in the string:

words = ['one',
        'two',
        'three', "fifty ninety"]
string = set('my favorite number is two fifty five'.split())

for word in words:
    spl = word.split()
    if len(spl) > 1:
        if all(string.intersection([w]) for w in spl):
            print(word)
    elif string.intersection([word]):
        print(word)

It will also return True for ninety five so that is something you need to decide is workable or not but using intersection for single words will work well. make sure you wrap the string in a list or a tuple or "foo" will become {"f","o"}

You can also use set.issuperset instead of all :

for word in words:
    spl = word.split()
    if len(spl) > 1:
        if string.issuperset(spl):
            print(word)
    elif string.intersection([word]):
        print(word)

You can use set intersection:

word = ['one', 'two', 'three']
string = 'my favorite number is two'
co_occuring_words = set(word) & set(string.split())
for word_item in co_occuring_words:
    print 'string contains a word from the word list: %s' % (word_item)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM