简体   繁体   中英

Separating columns using pandas.read_csv

I am trying to read one table from a larger .txt file into python.

An extract of the data is:

2 Network magnitudes:
    MLv       2.05 +/- 1.34   7            
    M         2.05            7 preferred  

7 Phase arrivals:
    sta  net   dist azi  phase   time         res     wt  sta
    BMOR  EC    0.0 226  P       00:22:31.385  -0.6 M  1.0  BMOR 
    BREF  EC    0.0 347  P       00:22:31.543  -0.5 M  1.0  BREF 
    BTAM  EC    0.0  58  P       00:22:31.796  -0.3 M  1.0  BTAM 
    BVC2  EC    0.0  26  P       00:22:33.061   0.8 M  1.0  BVC2 
    BNAS  EC    0.1 294  P       00:22:32.871  -0.1 M  1.0  BNAS 
    SUCR  EC    0.1 314  P       00:22:34.610   0.6 M  1.0  SUCR 
    BRRN  EC    0.1 207  P       00:22:34.768   0.4 M  1.0  BRRN 

7 Station magnitudes:
    sta  net   dist azi  type   value   res        amp per
    BMOR  EC    0.0 226  MLv     1.48 -0.57    1.20076    

I only want the phase arrivals table and so np.loadtext and np.genfromtxt both fall short for various reasons (can't deal with numbers and strings / contains a bug unless you specify only a one space (' ') delimiter, which I can't do here)

I've been trying with the pandas.read_csv fucntion but it isn't recognising the delimiters

a = pd.read_csv(datafileloc, sep='\+s', skiprows=5, skipfooter=3)

produces:

a
Out[90]: 
  sta  net   dist azi  phase   time         res     wt  sta
0  BMOR  EC    0.0 226  P       00:22:31.385  -0....       
1  BREF  EC    0.0 347  P       00:22:31.543  -0....       
2  BTAM  EC    0.0  58  P       00:22:31.796  -0....       
3  BVC2  EC    0.0  26  P       00:22:33.061   0....       
4  BNAS  EC    0.1 294  P       00:22:32.871  -0....       
5  SUCR  EC    0.1 314  P       00:22:34.610   0....       
6  BRRN  EC    0.1 207  P       00:22:34.768   0.... 

which looks good apart from that they're each one string and it hasn't paid attention to the white space delimiters:

a.values
Out[89]: 
array([['BMOR  EC    0.0 226  P       00:22:31.385  -0.6 M  1.0  BMOR'],
       ['BREF  EC    0.0 347  P       00:22:31.543  -0.5 M  1.0  BREF'],
       ['BTAM  EC    0.0  58  P       00:22:31.796  -0.3 M  1.0  BTAM'],
       ['BVC2  EC    0.0  26  P       00:22:33.061   0.8 M  1.0  BVC2'],
       ['BNAS  EC    0.1 294  P       00:22:32.871  -0.1 M  1.0  BNAS'],
       ['SUCR  EC    0.1 314  P       00:22:34.610   0.6 M  1.0  SUCR'],
       ['BRRN  EC    0.1 207  P       00:22:34.768   0.4 M  1.0  BRRN']], dtype=object)

Lines can be separated with list(a.values[0])[0].split() but this will then take reorganising to get individual columns. I would like to have pandas.read_csv just recognise they're separate so I can extract individual columns (being reasonably efficient is going to be important once I scale it up)

Where am I going wrong?

As pointed out by DSM , it is a typo in the delimiter:

\\s+ , not \\+s

which came from a typo in the documentation , under the delim_whitespace parameter heading.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM