简体   繁体   中英

Use regex to extract substring from pandas column

I have columns with values like this:

Col1

1/1/100 'BA1
1/1/102Packe
1/1/102 'to_

And need to extract just 1/1/100 (from the first row) and so on (1/1/102...)

I am using:

df['col1'] = df['col1'].str.extract('(\d+)/(\d+)/(\d+)', expand=True)

But I'm getting only 1.

Not sure why this is not working, is there a problem with regex or I need some kind of mapping?

You need to only use a single capturing group:

df['col1'] = df['col1'].str.extract('(\d+/\d+/\d+)', expand=True)
                                     ^           ^

The str.extract method returns the value captured with the first capturing group, and your regex captures the first 1 into that group.

Test:

>>> import pandas as pd
>>> df = pd.DataFrame({"col1":["1/1/100 'BA1", "1/1/102Packe", "1/1/102 'to_"]})
>>> df['col1'].str.extract('(\d+/\d+/\d+)', expand=True)
         0
0  1/1/100
1  1/1/102
2  1/1/102

you can try this also,

df['Col1']=df['Col1'].str.replace('\d+|/','')

Note: Regex is more powerful than .str.replace .

我建议这个正则表达式:

df['col1'].str.extract('\b(\d/?)+', expand=True)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM