Why is my Regular Expression not matching?

Question

I have the following pattern:

find_pattern = re.compile(r'(ga:country:\s)([a-zA-Z()\s]*)(.*users:\s)(\d+),')

This is how the input that should be matched looks like:

        ga:country: (not set),Date range:0,ga:users:60,
        ga:country: Albania,Date range:0,ga:users:7,
        ga:country: Algeria,Date range:0,ga:users:10,
        ...
        ga:country: Argentina,Date range:0,ga:users:61,
        ga:country: Armenia,Date range:0,ga:users:2,

And this is how the output is going to be formatted (in case it adds any value to the question):

        ['(not set)', 60],
        ['Albania', 7],

And when I run a test:

matches = find_pattern.finditer(self.data)
print('matches:', matches)
for match in matches:
    print(match)

No matches are found.

Hope someone is able to help.

Answer 1

I would suggest using 2 capturing groups instead of 4, add optional whitespace chars after ga: and make the whitspace chars optional after users:

The .* could also be non greedy .*? to get the first one in case there are more users: parts.

To prevent users: begin part of a larger word, you could make it more specific matching :users:

\bga:\s*country:\s*([a-zA-Z()\s]*),.*?:users:(\d+)

Regex demo

Example with re.findall that returns the values of the capturing groups:

import re

regex = r"\bga:\s*country:\s*([a-zA-Z()\s]*),.*?:users:(\d+)"

s = ("ga:country: (not set),Date range:0,ga:users:60,\n"
    "ga:country: Albania,Date range:0,ga:users:7,\n"
    "ga:country: Algeria,Date range:0,ga:users:10,\n"
    "ga:country: Argentina,Date range:0,ga:users:61,\n"
    "ga:country: Armenia,Date range:0,ga:users:2,")

print(re.findall(regex, s))

Output

[('(not set)', '60'), ('Albania', '7'), ('Algeria', '10'), ('Argentina', '61'), ('Armenia', '2')]

Why is my Regular Expression not matching?

Question

1 answers

solution1
0 2020-11-12 13:06:45

Why is my Regular Expression not matching?

Question

1 answers

solution1 0 2020-11-12 13:06:45

solution1
0 2020-11-12 13:06:45