简体   繁体   中英

Parsing a list of string by the last occurring space python

I have a list of strings that contain spaces that I need to parse by the last or second to last space (dateTime). I have tried split() on the main string but the problem is that there are a lot of spaces included for description of data, so instead I deferred to using split(/n) . See below for sample list.

['Origin Time       2016/04/16 01:25:00',
 'Lat.              32.753',
 'Long.             130.762',
 'Depth. (km)       12',
 'Mag.              7.3',
 'Station Code      AIC001',
 'Station Lat.      35.2976',
 'Station Long.     136.7500',
 'Station Height(m) 6',
 'Record Time       2016/04/16 01:28:06',
 'Sampling Freq(Hz) 100Hz',
 'Duration Time(s)  120',
 'Dir.              N-S',
 'Scale Factor      7845(gal)/8223790',
 'Max. Acc. (gal)   2.327',
 'Last Correction   2016/04/16 01:28:08'

I'm not sure the the best angle is to split the first and last elements of this list, I would like to separate them so that I can create a pandas Dataframe from it.

That looks a lot like a fixed-width format file, not one formatted using a delimiter. If your pre-split string is in original , using pd.read_fwf with the default 'guess the columns' inference engine will actually work on your sample:

import io, pandas as pd
df = pd.read_fwf(io.StringIO(original), header=None)

But I think it's safer -- or at least more explicit -- to specify what the column widths are directly, whether via widths or colspecs .

In [55]: pd.read_fwf(io.StringIO(original), header=None, widths=[17, 100])
Out[55]: 
                    0                    1
0         Origin Time  2016/04/16 01:25:00
1                Lat.               32.753
2               Long.              130.762
3         Depth. (km)                   12
4                Mag.                  7.3
5        Station Code               AIC001
6        Station Lat.              35.2976
7       Station Long.             136.7500
8   Station Height(m)                    6
9         Record Time  2016/04/16 01:28:06
10  Sampling Freq(Hz)                100Hz
11   Duration Time(s)                  120
12               Dir.                  N-S
13       Scale Factor    7845(gal)/8223790
14    Max. Acc. (gal)                2.327
15    Last Correction  2016/04/16 01:28:08

Of course, if your file is inconsistently formatted, you might not be so lucky and have to include some workarounds.

FWIW, this is just a glorified version of

df = pd.DataFrame([[row[:17].strip(), row[17:].strip()] for row in original.splitlines()])

in this case.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM