如何在包含文本的pandas系列的每一行中提取特定数字

Question

I have a pd.Series looks like as follows 我有一个pd.Series看起来如下

 O some texts...final exam marks:50 next level:10 1 some texts....final exam marks he has got:54 next level:15 2 some texts...final marks ...some texts: 45 next best level:20

I want extract those numbers 50,54,45 from that Series. 我想从该系列中提取那些数字50,54,45。 Please note that there are multiple numbers in the texts of each row. 请注意，每行的文本中有多个数字。 I have tried regex, but instead of giving only those specific numbers, it is picking up all the numbers in each row. 我已经尝试过正则表达式，但它不是只提供那些特定的数字，而是取出每一行中的所有数字。 Essentially I want the numbers right after the word 'marks'. 基本上我想在“标记”这个词后面加上数字。 Any help would be appreciated. 任何帮助，将不胜感激。

ps I have updated the problem now. ps我现在更新了这个问题。 I tried the solutions given here. 我尝试了这里给出的解决方案。 In facts I tried with 事实上，我尝试过

 pd.Series.str.findall('?<=specific text *(\\d{2})')

But getting and empty list. 但得到并清空列表。 The representation of the example here is very much similar to the actual problem, hence I edited the post. 这里示例的表示与实际问题非常相似，因此我编辑了帖子。

Many many thanks in advance. 许多人提前感谢。

Answer 1

Try 尝试

s.str.extract('.*marks:\s?(\d+)', expand = False)


0    50
1    54
2    45

With the update: 随着更新：

s.str.extract('.*marks.*?(\d+)', expand = False)

This regex considers the fact that there may or may not be a character after marks 这个正则表达式考虑了在标记之后可能存在或不存在字符的事实

You get 你得到

0    50
1    54
2    45

Answer 2

You need look behind syntax (?<=) , which asserts a desired pattern is preceded by another pattern, (?<=marks:) *([0-9]+) extract digits after the word marks: followed by optional spaces: 你需要看一下语法（？<=） ，它断言所需的模式前面有另一个模式， (?<=marks:) *([0-9]+)提取数字后面的数字：后跟可选的空格：

s
#0         some texts...final exam marks:50 next lev...
#1         some texts....final exam marks:54 next le...
#2         some texts...final marks: 45 next best le...
#Name: 1, dtype: object

s.str.extract("(?<=marks:) *([0-9]+)", expand=False)

#0    50
#1    54
#2    45
#Name: 1, dtype: object

如何在包含文本的pandas系列的每一行中提取特定数字

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-05-30 01:55:13

解决方案2
1 2017-05-30 01:53:18

如何在包含文本的pandas系列的每一行中提取特定数字

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-05-30 01:55:13

解决方案2 1 2017-05-30 01:53:18

解决方案1
2 已采纳 2017-05-30 01:55:13

解决方案2
1 2017-05-30 01:53:18