使用正则表达式从 Pandas 列中提取子字符串

Question

I have columns with values like this:我有这样的值的列：

Col1

1/1/100 'BA1
1/1/102Packe
1/1/102 'to_

And need to extract just 1/1/100 (from the first row) and so on (1/1/102...)并且只需要提取 1/1/100（从第一行）等等（1/1/102...）

I am using:我在用：

df['col1'] = df['col1'].str.extract('(\d+)/(\d+)/(\d+)', expand=True)

But I'm getting only 1.但我只得到 1。

Not sure why this is not working, is there a problem with regex or I need some kind of mapping?不知道为什么这不起作用，正则表达式有问题还是我需要某种映射？

Answer 1

You need to only use a single capturing group:您只需要使用一个捕获组：

df['col1'] = df['col1'].str.extract('(\d+/\d+/\d+)', expand=True)
                                     ^           ^

The str.extract method returns the value captured with the first capturing group, and your regex captures the first 1 into that group. str.extract方法返回用第一个捕获组捕获的值，您的正则表达式将第一个1捕获到该组中。

Test:测试：

>>> import pandas as pd
>>> df = pd.DataFrame({"col1":["1/1/100 'BA1", "1/1/102Packe", "1/1/102 'to_"]})
>>> df['col1'].str.extract('(\d+/\d+/\d+)', expand=True)
         0
0  1/1/100
1  1/1/102
2  1/1/102

Answer 2

you can try this also,你也可以试试这个

df['Col1']=df['Col1'].str.replace('\d+|/','')

Note: Regex is more powerful than .str.replace .注意：正则表达式比.str.replace更强大。

Answer 3

我建议这个正则表达式：

df['col1'].str.extract('\b(\d/?)+', expand=True)

使用正则表达式从 Pandas 列中提取子字符串

问题描述

3 个解决方案

解决方案1
4 已采纳 2019-01-23 10:24:20

解决方案2
0 2019-01-23 10:30:36

解决方案3
0 2019-01-23 13:13:17

使用正则表达式从 Pandas 列中提取子字符串

问题描述

3 个解决方案

解决方案1 4 已采纳 2019-01-23 10:24:20

解决方案2 0 2019-01-23 10:30:36

解决方案3 0 2019-01-23 13:13:17

解决方案1
4 已采纳 2019-01-23 10:24:20

解决方案2
0 2019-01-23 10:30:36

解决方案3
0 2019-01-23 13:13:17