What would be a function f
based on regexes that, given an input text and a string, returns all the words containing this string in the text. For example:
f("This is just a simple text to test some basic things", "si")
would return:
["simple", "basic"]
(because these two words contain the substring "si"
)
How to do that?
For something like this i wouldn't use regex, I would use something like this:
def f(string, match):
string_list = string.split()
match_list = []
for word in string_list:
if match in word:
match_list.append(word)
return match_list
print f("This is just a simple text to test some basic things", "si")
I'm not convinced there isn't a better way to do this than my approach, but something like:
import re
def f(s, pat):
pat = r'(\w*%s\w*)' % pat # Not thrilled about this line
return re.findall(pat, s)
print f("This is just a simple text to test some basic things", "si")
Works:
['simple', 'basic']
Here is my attempt at a solution. I split the input string by " ", and then try to match each individual word to the pattern. If a match is found, the word is added to a result set.
import re
def f(str, pat):
matches = list()
str_list = str.split(' ');
for word in str_list:
regex = r'' + re.escape(word)
match = re.search(regex, word)
if match:
matches.append(word)
return matches
print f("This is just a simple text to test some basic things", "si")
import re
def func(s, pat):
pat = r'\b\S*%s\S*\b' % re.escape(pat)
return re.findall(pat, s)
print func("This is just a simple text to test some basic things", "si")
You need this . \\b
will take out only words by cutting at word boundary. \\S
will not select any space
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.