I have read the various other questions regarding similar examples; however, I think my example is different enough to warrant a new question.
I have Data Frame in the following format:
'Bob, Dole, 00001' '0.4'
'John, Smith, 00002' '0.2'
I would like to remove the name and just keep the ID in the first column:
'00001' '0.4'
'00002' '0.2'
I am new to Python, and I stumbled onto this code snippet which works. First I convert the Data Frame Dat
into a numpy A = Dat.to_numpy()
. Then I can remove the name portion using:
import re
print(re.sub("[^0-9]", "", A[1,0]))
I just don't know how to apply it to the entire Data Frame (without using a loop). Is there a simpler way to do this? Or should I just use a for loop?
Sounds like you could use apply - create a function that does what you want it to do on a row for example, then use this syntax:
df.apply(func)
Hope this helps, if not let me know.
You could use .extract()
to pull the ID out of the column:
import pandas as pd
df = pd.DataFrame({
'x': ['Bob, Dole, 00001', 'John, Smith, 00002'],
'y': ['0.4', '0.2']})
df['x'] = df['x'].str.extract(r'\w+, (\d+)')
print(df)
x y
0 00001 0.4
1 00002 0.2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.