I have a list of strings that contain spaces that I need to parse by the last or second to last space (dateTime). I have tried split() on the main string but the problem is that there are a lot of spaces included for description of data, so instead I deferred to using split(/n) . See below for sample list.
['Origin Time 2016/04/16 01:25:00',
'Lat. 32.753',
'Long. 130.762',
'Depth. (km) 12',
'Mag. 7.3',
'Station Code AIC001',
'Station Lat. 35.2976',
'Station Long. 136.7500',
'Station Height(m) 6',
'Record Time 2016/04/16 01:28:06',
'Sampling Freq(Hz) 100Hz',
'Duration Time(s) 120',
'Dir. N-S',
'Scale Factor 7845(gal)/8223790',
'Max. Acc. (gal) 2.327',
'Last Correction 2016/04/16 01:28:08'
I'm not sure the the best angle is to split the first and last elements of this list, I would like to separate them so that I can create a pandas Dataframe from it.
That looks a lot like a fixed-width format file, not one formatted using a delimiter. If your pre-split string is in original
, using pd.read_fwf
with the default 'guess the columns' inference engine will actually work on your sample:
import io, pandas as pd
df = pd.read_fwf(io.StringIO(original), header=None)
But I think it's safer -- or at least more explicit -- to specify what the column widths are directly, whether via widths
or colspecs
.
In [55]: pd.read_fwf(io.StringIO(original), header=None, widths=[17, 100])
Out[55]:
0 1
0 Origin Time 2016/04/16 01:25:00
1 Lat. 32.753
2 Long. 130.762
3 Depth. (km) 12
4 Mag. 7.3
5 Station Code AIC001
6 Station Lat. 35.2976
7 Station Long. 136.7500
8 Station Height(m) 6
9 Record Time 2016/04/16 01:28:06
10 Sampling Freq(Hz) 100Hz
11 Duration Time(s) 120
12 Dir. N-S
13 Scale Factor 7845(gal)/8223790
14 Max. Acc. (gal) 2.327
15 Last Correction 2016/04/16 01:28:08
Of course, if your file is inconsistently formatted, you might not be so lucky and have to include some workarounds.
FWIW, this is just a glorified version of
df = pd.DataFrame([[row[:17].strip(), row[17:].strip()] for row in original.splitlines()])
in this case.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.