Remove non-numeric values in column using Python

Question

I have read the various other questions regarding similar examples; however, I think my example is different enough to warrant a new question.

I have Data Frame in the following format:

'Bob, Dole, 00001' '0.4'
'John, Smith, 00002' '0.2'

I would like to remove the name and just keep the ID in the first column:

'00001' '0.4'
'00002' '0.2'

I am new to Python, and I stumbled onto this code snippet which works. First I convert the Data Frame Dat into a numpy A = Dat.to_numpy() . Then I can remove the name portion using:

import re
print(re.sub("[^0-9]", "", A[1,0]))

I just don't know how to apply it to the entire Data Frame (without using a loop). Is there a simpler way to do this? Or should I just use a for loop?

Answer 1

Sounds like you could use apply - create a function that does what you want it to do on a row for example, then use this syntax:

df.apply(func)

Hope this helps, if not let me know.

Answer 2

You could use .extract() to pull the ID out of the column:

import pandas as pd

df = pd.DataFrame({
    'x': ['Bob, Dole, 00001', 'John, Smith, 00002'], 
    'y': ['0.4', '0.2']})

df['x'] = df['x'].str.extract(r'\w+, (\d+)')
print(df)

       x    y
0  00001  0.4
1  00002  0.2

Remove non-numeric values in column using Python

Question

2 answers

solution1
0 2020-09-10 15:25:55

solution2
0 2020-09-10 15:48:13

Remove non-numeric values in column using Python

Question

2 answers

solution1 0 2020-09-10 15:25:55

solution2 0 2020-09-10 15:48:13

solution1
0 2020-09-10 15:25:55

solution2
0 2020-09-10 15:48:13