I need a python regex that will match all (non-empty) sequences of words in a string, assuming word is an arbitrary non-empty sequence of non-whitespace characters.
Something that will work like this:
s = "ab cd efg"
re.findall(..., s)
# ['ab', 'cd', 'efg', 'ab cd', 'cd efg', 'ab cd efg']
Closest I got to this was using regex
module, but still not what I want:
regex.findall(r"\b\S.+\b", s, overlapped=True)
# ['ab cd efg', 'cd efg', 'efg']
Also, just to be clear, I don't want to have 'ab efg'
in there.
Something like:
matches = "ab cd efg".split()
matches2 = [" ".join(matches[i:j])
for i in range(len(matches))
for j in range(i + 1, len(matches) + 1)]
print(matches2)
Outputs:
['ab', 'ab cd', 'ab cd efg', 'cd', 'cd efg', 'efg']
What you can do is match all of the strings and their whitespace, and then join contiguous slices together. (this is similar to Maxim's approach though I did develop this independently, but this preserves whitespace)
import regex
s = "ab cd efg"
subs = regex.findall(r"\S+\s*", s)
def combos(l):
out = []
for i in range(len(subs)):
for j in range(i + 1, len(subs) + 1):
out.append("".join(subs[i:j]).strip())
return out
print(combos(subs))
This first finds all \\S+\\s*
which matches a word followed by any amount of whitespace, and then gets all contiguous slices, joins them, and removes the whitespace from their right.
If whitespace is always a single space, just use Maxim's approach; it's simpler and faster but doesn't preserve whitespace.
Without regex:
import itertools
def n_wise(iterable, n=2):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
iterables = itertools.tee(iterable, n)
for k, it in enumerate(iterables):
for _ in range(k):
next(it, None)
return zip(*iterables)
def foo(s):
s = s.split()
for n in range(1, len(s)+1):
for thing in n_wise(s, n=n):
yield ' '.join(thing)
s = "ab cd efg hj"
result = [thing for thing in foo(s)]
print(result)
>>>
['ab', 'cd', 'efg', 'hj', 'ab cd', 'cd efg', 'efg hj', 'ab cd efg', 'cd efg hj', 'ab cd efg hj']
>>>
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.