简体   繁体   中英

Python regexes: return a list of words containing a given substring

What would be a function f based on regexes that, given an input text and a string, returns all the words containing this string in the text. For example:

f("This is just a simple text to test some basic things", "si")

would return:

["simple", "basic"]

(because these two words contain the substring "si" )

How to do that?

For something like this i wouldn't use regex, I would use something like this:

def f(string, match):
    string_list = string.split()
    match_list = []
    for word in string_list:
        if match in word:
            match_list.append(word)
    return match_list

print f("This is just a simple text to test some basic things", "si")

I'm not convinced there isn't a better way to do this than my approach, but something like:

import re

def f(s, pat):
    pat = r'(\w*%s\w*)' % pat       # Not thrilled about this line
    return re.findall(pat, s)


print f("This is just a simple text to test some basic things", "si")

Works:

['simple', 'basic']

Here is my attempt at a solution. I split the input string by " ", and then try to match each individual word to the pattern. If a match is found, the word is added to a result set.

import re

def f(str, pat):
    matches = list()
    str_list = str.split(' ');

    for word in str_list:
        regex = r'' + re.escape(word)
        match = re.search(regex, word)
        if match:
            matches.append(word)
    return matches

print f("This is just a simple text to test some basic things", "si")
import re

def func(s, pat):
    pat = r'\b\S*%s\S*\b' % re.escape(pat) 
    return re.findall(pat, s)


print func("This is just a simple text to test some basic things", "si")

You need this . \\b will take out only words by cutting at word boundary. \\S will not select any space .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM