If the text is 'Job 45, job 32 and then job 15' I'd like to get a result of ['job 45', 'job 32', 'job 15'] or ['45', '32', '15']
I tried r'[job]\\d+' which returns an empty list.
re.findall(r'[job]\d+', 'Job 45, job 32 and then job 15'.lower())
[]
I experimented with splitting on job.
re.split(r'job','Job 45, job 32 and then job 15'.lower())
['', ' 45, ', ' 32 and then ', ' 15']
I tried splitting on words.
re.findall(r'\w+','Job 45, job 32 and then job 15'.lower())
['job', '45', 'job', '32', 'and', 'then', 'job', '15']
which is workable .. I can check if an element is 'job' and if the following element can be converted to a number.
What would be a regular expression to get either ['job 45', 'job 32', 'job 15'] or ['45', '32', '15'] from 'Job 45, job 32 and then job 15' ?
Your regex [job]\\d+
has couple of problems,
[job]
is a character set which means it will match only one character either j or o or b
Second problem, there is no provision of space between job and number in your regex.
Third problem, as your input text contains Job as well as job, so to make a case insensitive match, you need (?i) flag.
So your corrected form of regex becomes this,
(?i)job\s+\d+
Sample python code
import re
s = 'Job 45, job 32 and then job 15';
str = re.findall('(?i)job\s+\d+', s)
print(str)
This gives following output,
['Job 45', 'job 32', 'job 15']
Or much easier using 'job (\\d+)'
expression:
>>> re.findall('job (\d+)',s.lower())
['45', '32', '15']
>>>
One approach would be to use the following pattern, which uses a positive lookbehind:
(?<=\bjob )\d+
This captures any group of digits which are immediately preceded by the text job
(case insensitive) followed by a single space.
text = "Job 45, job 32 and then job 15"
res = re.findall(r'(?<=\bjob )\d+', text, re.I)
print(res)
['45', '32', '15']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.