简体   繁体   English

使用pandas.read_csv分隔列

[英]Separating columns using pandas.read_csv

I am trying to read one table from a larger .txt file into python. 我正在尝试从一个较大的.txt文件中将一个表读取到python中。

An extract of the data is: 数据摘录为:

2 Network magnitudes:
    MLv       2.05 +/- 1.34   7            
    M         2.05            7 preferred  

7 Phase arrivals:
    sta  net   dist azi  phase   time         res     wt  sta
    BMOR  EC    0.0 226  P       00:22:31.385  -0.6 M  1.0  BMOR 
    BREF  EC    0.0 347  P       00:22:31.543  -0.5 M  1.0  BREF 
    BTAM  EC    0.0  58  P       00:22:31.796  -0.3 M  1.0  BTAM 
    BVC2  EC    0.0  26  P       00:22:33.061   0.8 M  1.0  BVC2 
    BNAS  EC    0.1 294  P       00:22:32.871  -0.1 M  1.0  BNAS 
    SUCR  EC    0.1 314  P       00:22:34.610   0.6 M  1.0  SUCR 
    BRRN  EC    0.1 207  P       00:22:34.768   0.4 M  1.0  BRRN 

7 Station magnitudes:
    sta  net   dist azi  type   value   res        amp per
    BMOR  EC    0.0 226  MLv     1.48 -0.57    1.20076    

I only want the phase arrivals table and so np.loadtext and np.genfromtxt both fall short for various reasons (can't deal with numbers and strings / contains a bug unless you specify only a one space (' ') delimiter, which I can't do here) 我只希望相位到达表,因此np.loadtextnp.genfromtxt都由于各种原因而不足(除非您仅指定一个空格('')分隔符,否则它不能处理数字和字符串/包含错误 )不能在这里做)

I've been trying with the pandas.read_csv fucntion but it isn't recognising the delimiters 我一直在尝试使用pandas.read_csv功能,但无法识别分隔符

a = pd.read_csv(datafileloc, sep='\+s', skiprows=5, skipfooter=3)

produces: 生产:

a
Out[90]: 
  sta  net   dist azi  phase   time         res     wt  sta
0  BMOR  EC    0.0 226  P       00:22:31.385  -0....       
1  BREF  EC    0.0 347  P       00:22:31.543  -0....       
2  BTAM  EC    0.0  58  P       00:22:31.796  -0....       
3  BVC2  EC    0.0  26  P       00:22:33.061   0....       
4  BNAS  EC    0.1 294  P       00:22:32.871  -0....       
5  SUCR  EC    0.1 314  P       00:22:34.610   0....       
6  BRRN  EC    0.1 207  P       00:22:34.768   0.... 

which looks good apart from that they're each one string and it hasn't paid attention to the white space delimiters: 除了它们每个都是一个字符串之外,它看起来还不错,它并没有注意空格分隔符:

a.values
Out[89]: 
array([['BMOR  EC    0.0 226  P       00:22:31.385  -0.6 M  1.0  BMOR'],
       ['BREF  EC    0.0 347  P       00:22:31.543  -0.5 M  1.0  BREF'],
       ['BTAM  EC    0.0  58  P       00:22:31.796  -0.3 M  1.0  BTAM'],
       ['BVC2  EC    0.0  26  P       00:22:33.061   0.8 M  1.0  BVC2'],
       ['BNAS  EC    0.1 294  P       00:22:32.871  -0.1 M  1.0  BNAS'],
       ['SUCR  EC    0.1 314  P       00:22:34.610   0.6 M  1.0  SUCR'],
       ['BRRN  EC    0.1 207  P       00:22:34.768   0.4 M  1.0  BRRN']], dtype=object)

Lines can be separated with list(a.values[0])[0].split() but this will then take reorganising to get individual columns. 行可以用list(a.values[0])[0].split()分隔,但是这将需要重新组织以获取单个列。 I would like to have pandas.read_csv just recognise they're separate so I can extract individual columns (being reasonably efficient is going to be important once I scale it up) 我想让pandas.read_csv认识到它们是分开的,所以我可以提取单个列(一旦扩大规模,合理的效率就很重要)

Where am I going wrong? 我要去哪里错了?

As pointed out by DSM , it is a typo in the delimiter: 正如DSM所指出的,这是分隔符中的一个错字:

\\s+ , not \\+s \\s+而不是\\+s

which came from a typo in the documentation , under the delim_whitespace parameter heading. 它来自文档delim_whitespace参数标题下的delim_whitespace

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM