How do I quickly make new columns that hold the three chunks contained in the column 'File'?
recieved messy data like this
d = { 'File' : pd.Series(['firstname lastname 05/31/1996 9999999999 ', 'FN SometimesMiddileInitial. LN 05/31/1996 9999999999 ']),
'Status' : pd.Series([0., 0.]),
'Error' : pd.Series([2., 2.])}
df=pd.DataFrame(d)
UPDATE In reality, i'm starting from a very messy excel file and my data has '\\xa0 \\xa0' between string characters. so my first attempt looks like
from pandas import DataFrame, ExcelFile
import pandas as pd
location = r'c:/users/meinzerc/Desktop/table.xlsx'
xls = ExcelFile(location)
table = xls.parse('Sheet1')
splitdf = df['File'].str.split('\s*)
My attempt doesn't work at all. WHY?
You could use a regex to pick up at least two spaces:
In [11]: df.File.str.split('\s\s+')
Out[11]:
0 [firstname lastname, 05/31/1996, 9999999999, ]
1 [FN SometimesMiddileInitial. LN, 05/31/1996, 9...
Name: File, dtype: object
Perhaps a better option is to use extract (and perhaps there is a neater regex!!):
In [12]: df.File.str.extract('\s*(?P<name>.*?)\s+(?P<date>\d+/\d+/\d+)\s+(?P<number>\w+)\s*')
Out[12]:
name date number
0 firstname lastname 05/31/1996 9999999999
1 FN SometimesMiddileInitial. LN 05/31/1996 9999999999
[2 rows x 3 columns]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.