使用pandas.read_csv分隔列

Question

I am trying to read one table from a larger .txt file into python. 我正在尝试从一个较大的.txt文件中将一个表读取到python中。

An extract of the data is: 数据摘录为：

2 Network magnitudes:
    MLv       2.05 +/- 1.34   7            
    M         2.05            7 preferred  

7 Phase arrivals:
    sta  net   dist azi  phase   time         res     wt  sta
    BMOR  EC    0.0 226  P       00:22:31.385  -0.6 M  1.0  BMOR 
    BREF  EC    0.0 347  P       00:22:31.543  -0.5 M  1.0  BREF 
    BTAM  EC    0.0  58  P       00:22:31.796  -0.3 M  1.0  BTAM 
    BVC2  EC    0.0  26  P       00:22:33.061   0.8 M  1.0  BVC2 
    BNAS  EC    0.1 294  P       00:22:32.871  -0.1 M  1.0  BNAS 
    SUCR  EC    0.1 314  P       00:22:34.610   0.6 M  1.0  SUCR 
    BRRN  EC    0.1 207  P       00:22:34.768   0.4 M  1.0  BRRN 

7 Station magnitudes:
    sta  net   dist azi  type   value   res        amp per
    BMOR  EC    0.0 226  MLv     1.48 -0.57    1.20076

I only want the phase arrivals table and so np.loadtext and np.genfromtxt both fall short for various reasons (can't deal with numbers and strings / contains a bug unless you specify only a one space (' ') delimiter, which I can't do here) 我只希望相位到达表，因此np.loadtext和np.genfromtxt都由于各种原因而不足（除非您仅指定一个空格（''）分隔符，否则它不能处理数字和字符串/包含错误）不能在这里做）

I've been trying with the pandas.read_csv fucntion but it isn't recognising the delimiters 我一直在尝试使用pandas.read_csv功能，但无法识别分隔符

a = pd.read_csv(datafileloc, sep='\+s', skiprows=5, skipfooter=3)

produces: 生产：

a
Out[90]: 
  sta  net   dist azi  phase   time         res     wt  sta
0  BMOR  EC    0.0 226  P       00:22:31.385  -0....       
1  BREF  EC    0.0 347  P       00:22:31.543  -0....       
2  BTAM  EC    0.0  58  P       00:22:31.796  -0....       
3  BVC2  EC    0.0  26  P       00:22:33.061   0....       
4  BNAS  EC    0.1 294  P       00:22:32.871  -0....       
5  SUCR  EC    0.1 314  P       00:22:34.610   0....       
6  BRRN  EC    0.1 207  P       00:22:34.768   0....

which looks good apart from that they're each one string and it hasn't paid attention to the white space delimiters: 除了它们每个都是一个字符串之外，它看起来还不错，它并没有注意空格分隔符：

a.values
Out[89]: 
array([['BMOR  EC    0.0 226  P       00:22:31.385  -0.6 M  1.0  BMOR'],
       ['BREF  EC    0.0 347  P       00:22:31.543  -0.5 M  1.0  BREF'],
       ['BTAM  EC    0.0  58  P       00:22:31.796  -0.3 M  1.0  BTAM'],
       ['BVC2  EC    0.0  26  P       00:22:33.061   0.8 M  1.0  BVC2'],
       ['BNAS  EC    0.1 294  P       00:22:32.871  -0.1 M  1.0  BNAS'],
       ['SUCR  EC    0.1 314  P       00:22:34.610   0.6 M  1.0  SUCR'],
       ['BRRN  EC    0.1 207  P       00:22:34.768   0.4 M  1.0  BRRN']], dtype=object)

Lines can be separated with list(a.values[0])[0].split() but this will then take reorganising to get individual columns. 行可以用list(a.values[0])[0].split()分隔，但是这将需要重新组织以获取单个列。 I would like to have pandas.read_csv just recognise they're separate so I can extract individual columns (being reasonably efficient is going to be important once I scale it up) 我想让pandas.read_csv认识到它们是分开的，所以我可以提取单个列（一旦扩大规模，合理的效率就很重要）

Where am I going wrong? 我要去哪里错了？

Answer 1

As pointed out by DSM , it is a typo in the delimiter: 正如DSM所指出的，这是分隔符中的一个错字：

\\s+ , not \\+s \\s+而不是\\+s

which came from a typo in the documentation , under the delim_whitespace parameter heading. 它来自文档中delim_whitespace参数标题下的delim_whitespace 。

使用pandas.read_csv分隔列

问题描述

1 个解决方案

解决方案1
2 已采纳 2016-05-13 00:27:27

使用pandas.read_csv分隔列

问题描述

1 个解决方案

解决方案1 2 已采纳 2016-05-13 00:27:27

解决方案1
2 已采纳 2016-05-13 00:27:27