[英]Use regex to extract substring from pandas column
I have columns with values like this:我有这样的值的列:
Col1
1/1/100 'BA1
1/1/102Packe
1/1/102 'to_
And need to extract just 1/1/100 (from the first row) and so on (1/1/102...)并且只需要提取 1/1/100(从第一行)等等(1/1/102...)
I am using:我在用:
df['col1'] = df['col1'].str.extract('(\d+)/(\d+)/(\d+)', expand=True)
But I'm getting only 1.但我只得到 1。
Not sure why this is not working, is there a problem with regex or I need some kind of mapping?不知道为什么这不起作用,正则表达式有问题还是我需要某种映射?
You need to only use a single capturing group:您只需要使用一个捕获组:
df['col1'] = df['col1'].str.extract('(\d+/\d+/\d+)', expand=True)
^ ^
The str.extract
method returns the value captured with the first capturing group, and your regex captures the first 1
into that group. str.extract
方法返回用第一个捕获组捕获的值,您的正则表达式将第一个1
捕获到该组中。
Test:测试:
>>> import pandas as pd
>>> df = pd.DataFrame({"col1":["1/1/100 'BA1", "1/1/102Packe", "1/1/102 'to_"]})
>>> df['col1'].str.extract('(\d+/\d+/\d+)', expand=True)
0
0 1/1/100
1 1/1/102
2 1/1/102
you can try this also,你也可以试试这个
df['Col1']=df['Col1'].str.replace('\d+|/','')
Note: Regex is more powerful than .str.replace
.注意:正则表达式比.str.replace
更强大。
我建议这个正则表达式:
df['col1'].str.extract('\b(\d/?)+', expand=True)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.