I have the following pattern:
find_pattern = re.compile(r'(ga:country:\s)([a-zA-Z()\s]*)(.*users:\s)(\d+),')
This is how the input that should be matched looks like:
ga:country: (not set),Date range:0,ga:users:60,
ga:country: Albania,Date range:0,ga:users:7,
ga:country: Algeria,Date range:0,ga:users:10,
...
ga:country: Argentina,Date range:0,ga:users:61,
ga:country: Armenia,Date range:0,ga:users:2,
And this is how the output is going to be formatted (in case it adds any value to the question):
['(not set)', 60],
['Albania', 7],
And when I run a test:
matches = find_pattern.finditer(self.data)
print('matches:', matches)
for match in matches:
print(match)
No matches are found.
Hope someone is able to help.
I would suggest using 2 capturing groups instead of 4, add optional whitespace chars after ga:
and make the whitspace chars optional after users:
The .*
could also be non greedy .*?
to get the first one in case there are more users:
parts.
To prevent users:
begin part of a larger word, you could make it more specific matching :users:
\bga:\s*country:\s*([a-zA-Z()\s]*),.*?:users:(\d+)
Example with re.findall that returns the values of the capturing groups:
import re
regex = r"\bga:\s*country:\s*([a-zA-Z()\s]*),.*?:users:(\d+)"
s = ("ga:country: (not set),Date range:0,ga:users:60,\n"
"ga:country: Albania,Date range:0,ga:users:7,\n"
"ga:country: Algeria,Date range:0,ga:users:10,\n"
"ga:country: Argentina,Date range:0,ga:users:61,\n"
"ga:country: Armenia,Date range:0,ga:users:2,")
print(re.findall(regex, s))
Output
[('(not set)', '60'), ('Albania', '7'), ('Algeria', '10'), ('Argentina', '61'), ('Armenia', '2')]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.