Regular expression for simple patterns

Question

Problem

I have an image dataset that describes different activities appearing in the particular images. Each image in the dataset is named as <activity>_<num> . For example, educating_13.jpg , practicing_147.jpg , etc.

Now I want to select images with same activity, say "cooking", and I decided to do this using re module in Python. The script I wrote is like

pattern = "^(\w+)_(\d+)$"
for filename in os.listdir("."):
    root, _ = os.path.splitext(filename)
    activity = re.match(pattern, root).group(1)
    if activity == "cooking":
        # do something

However, even though many images are successfully processed. It finally aborted with AttributeError . It seems that some of the images could not be matched with the specified pattern.

So do I make some mistake? Any input is appreciated.

EDIT:

By using exception mechanism in Python, it turns out that of almost 150 thousand images, there is a text file called temp.txt and this is the one that violates the pattern.

Answer 1

Without using regex. Using str.split

Ex:

for filename in os.listdir("."):
    root, _ = os.path.splitext(filename)
    if "_" in root:
        activity, num = root.split("_")
        if activity == "cooking":
            # do something

Answer 2

re.match(pattern, root) can return None if not matching

You can check the result of re.match(pattern, root) == None and find the image
use https://regex101.com/ to check your regexp with name of images

Answer 3

If re.match(pattern, root) is None then calling .group(1) will give you the attribute error. So in certain cases you don't seem to match all entries in your directory.

It's hard to know which ones are giving you problems, but by default \\w matches only [a-zA-Z0-9_] , so:

Do any files contain punctuation characters (eg %)?
Do any files contain non-ASCII characters (eg ñ)?
Are there non-dataset related files in the directory as well?

You could post the directory listing, then maybe we can spot the file.

Regular expression for simple patterns

Question

Problem

3 answers

solution1
3 ACCPTED 2019-04-23 06:33:18

solution2
1 2019-04-23 06:34:22

solution3
1 2019-04-23 06:39:55

Regular expression for simple patterns

Question

Problem

3 answers

solution1 3 ACCPTED 2019-04-23 06:33:18

solution2 1 2019-04-23 06:34:22

solution3 1 2019-04-23 06:39:55

solution1
3 ACCPTED 2019-04-23 06:33:18

solution2
1 2019-04-23 06:34:22

solution3
1 2019-04-23 06:39:55