How to extract a specific digit in each row of a pandas series containing text

Question

I have a pd.Series looks like as follows

 O some texts...final exam marks:50 next level:10 1 some texts....final exam marks he has got:54 next level:15 2 some texts...final marks ...some texts: 45 next best level:20

I want extract those numbers 50,54,45 from that Series. Please note that there are multiple numbers in the texts of each row. I have tried regex, but instead of giving only those specific numbers, it is picking up all the numbers in each row. Essentially I want the numbers right after the word 'marks'. Any help would be appreciated.

ps I have updated the problem now. I tried the solutions given here. In facts I tried with

 pd.Series.str.findall('?<=specific text *(\\d{2})')

But getting and empty list. The representation of the example here is very much similar to the actual problem, hence I edited the post.

Many many thanks in advance.

Answer 1

Try

s.str.extract('.*marks:\s?(\d+)', expand = False)


0    50
1    54
2    45

With the update:

s.str.extract('.*marks.*?(\d+)', expand = False)

This regex considers the fact that there may or may not be a character after marks

You get

0    50
1    54
2    45

Answer 2

You need look behind syntax (?<=) , which asserts a desired pattern is preceded by another pattern, (?<=marks:) *([0-9]+) extract digits after the word marks: followed by optional spaces:

s
#0         some texts...final exam marks:50 next lev...
#1         some texts....final exam marks:54 next le...
#2         some texts...final marks: 45 next best le...
#Name: 1, dtype: object

s.str.extract("(?<=marks:) *([0-9]+)", expand=False)

#0    50
#1    54
#2    45
#Name: 1, dtype: object

How to extract a specific digit in each row of a pandas series containing text

Question

2 answers

solution1
2 ACCPTED 2017-05-30 01:55:13

solution2
1 2017-05-30 01:53:18

How to extract a specific digit in each row of a pandas series containing text

Question

2 answers

solution1 2 ACCPTED 2017-05-30 01:55:13

solution2 1 2017-05-30 01:53:18

solution1
2 ACCPTED 2017-05-30 01:55:13

solution2
1 2017-05-30 01:53:18