简体   繁体   中英

Remove non-numeric values in column using Python

I have read the various other questions regarding similar examples; however, I think my example is different enough to warrant a new question.

I have Data Frame in the following format:

'Bob, Dole, 00001' '0.4'
'John, Smith, 00002' '0.2'

I would like to remove the name and just keep the ID in the first column:

'00001' '0.4'
'00002' '0.2'

I am new to Python, and I stumbled onto this code snippet which works. First I convert the Data Frame Dat into a numpy A = Dat.to_numpy() . Then I can remove the name portion using:

import re
print(re.sub("[^0-9]", "", A[1,0]))

I just don't know how to apply it to the entire Data Frame (without using a loop). Is there a simpler way to do this? Or should I just use a for loop?

Sounds like you could use apply - create a function that does what you want it to do on a row for example, then use this syntax:

df.apply(func)

Hope this helps, if not let me know.

You could use .extract() to pull the ID out of the column:

import pandas as pd

df = pd.DataFrame({
    'x': ['Bob, Dole, 00001', 'John, Smith, 00002'], 
    'y': ['0.4', '0.2']})

df['x'] = df['x'].str.extract(r'\w+, (\d+)')
print(df)

       x    y
0  00001  0.4
1  00002  0.2

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM