How to extract substring with Python Regex Exact Match

Question

I'm learning Python Regular Expression (re) to analyze twitter text.

Let's say I have twitter text like below and I only want to extract exactly '3/10' from txt.
Python return empty list [] in this case.

txt = "my mood is low 3/10. 05/01/2021 Tuesday"
re.findall('^\d+\/\d{2}$', txt)

What's wrong with my code?

Answer 1

Instead of using anchors to match the whole line, you can use negative lookarounds to assert a whitespace boundary to the left, and not a / to the right to match 3/10 only.

(?<!\S)\d+\/\d{2}(?!/)

Regex demo

import re
txt = "my mood is low 3/10. 05/01/2021 Tuesday"
print(re.findall('(?<!\S)\d+\/\d{2}(?!/)', txt))

Output

['3/10']

Answer 2

Remove the ^ and $

re.findall(r'\b\d+/\d{2}\b', txt)

Answer 3

According to re docs

^ (Caret.) Matches the start of the string, and in MULTILINE mode also matches immediately after each newline.

$ Matches the end of the string or just before the newline at the end of the string, and in MULTILINE mode also matches before a newline. foo matches both 'foo' and 'foobar', while the regular expression foo$ matches only 'foo'. More interestingly, searching for foo.$ in 'foo1\nfoo2\n' matches 'foo2' normally, but 'foo1' in MULTILINE mode; searching for a single $ in 'foo\n' will find two (empty) matches: one just before the newline, and one at the end of the string.

This is not case in your example. You would need to use more advanced zero-length assertions.

How to extract substring with Python Regex Exact Match

Question

3 answers

solution1
2 ACCPTED 2021-01-05 08:21:58

solution2
0 2021-01-05 08:21:38

solution3
0 2021-01-05 08:26:35

How to extract substring with Python Regex Exact Match

Question

3 answers

solution1 2 ACCPTED 2021-01-05 08:21:58

solution2 0 2021-01-05 08:21:38

solution3 0 2021-01-05 08:26:35

solution1
2 ACCPTED 2021-01-05 08:21:58

solution2
0 2021-01-05 08:21:38

solution3
0 2021-01-05 08:26:35