[英]Separating columns using pandas.read_csv
I am trying to read one table from a larger .txt
file into python. 我正在尝试从一个较大的.txt
文件中将一个表读取到python中。
An extract of the data is: 数据摘录为:
2 Network magnitudes:
MLv 2.05 +/- 1.34 7
M 2.05 7 preferred
7 Phase arrivals:
sta net dist azi phase time res wt sta
BMOR EC 0.0 226 P 00:22:31.385 -0.6 M 1.0 BMOR
BREF EC 0.0 347 P 00:22:31.543 -0.5 M 1.0 BREF
BTAM EC 0.0 58 P 00:22:31.796 -0.3 M 1.0 BTAM
BVC2 EC 0.0 26 P 00:22:33.061 0.8 M 1.0 BVC2
BNAS EC 0.1 294 P 00:22:32.871 -0.1 M 1.0 BNAS
SUCR EC 0.1 314 P 00:22:34.610 0.6 M 1.0 SUCR
BRRN EC 0.1 207 P 00:22:34.768 0.4 M 1.0 BRRN
7 Station magnitudes:
sta net dist azi type value res amp per
BMOR EC 0.0 226 MLv 1.48 -0.57 1.20076
I only want the phase arrivals table and so np.loadtext
and np.genfromtxt
both fall short for various reasons (can't deal with numbers and strings / contains a bug unless you specify only a one space (' ') delimiter, which I can't do here) 我只希望相位到达表,因此np.loadtext
和np.genfromtxt
都由于各种原因而不足(除非您仅指定一个空格('')分隔符,否则它不能处理数字和字符串/包含错误 )不能在这里做)
I've been trying with the pandas.read_csv
fucntion but it isn't recognising the delimiters 我一直在尝试使用pandas.read_csv
功能,但无法识别分隔符
a = pd.read_csv(datafileloc, sep='\+s', skiprows=5, skipfooter=3)
produces: 生产:
a
Out[90]:
sta net dist azi phase time res wt sta
0 BMOR EC 0.0 226 P 00:22:31.385 -0....
1 BREF EC 0.0 347 P 00:22:31.543 -0....
2 BTAM EC 0.0 58 P 00:22:31.796 -0....
3 BVC2 EC 0.0 26 P 00:22:33.061 0....
4 BNAS EC 0.1 294 P 00:22:32.871 -0....
5 SUCR EC 0.1 314 P 00:22:34.610 0....
6 BRRN EC 0.1 207 P 00:22:34.768 0....
which looks good apart from that they're each one string and it hasn't paid attention to the white space delimiters: 除了它们每个都是一个字符串之外,它看起来还不错,它并没有注意空格分隔符:
a.values
Out[89]:
array([['BMOR EC 0.0 226 P 00:22:31.385 -0.6 M 1.0 BMOR'],
['BREF EC 0.0 347 P 00:22:31.543 -0.5 M 1.0 BREF'],
['BTAM EC 0.0 58 P 00:22:31.796 -0.3 M 1.0 BTAM'],
['BVC2 EC 0.0 26 P 00:22:33.061 0.8 M 1.0 BVC2'],
['BNAS EC 0.1 294 P 00:22:32.871 -0.1 M 1.0 BNAS'],
['SUCR EC 0.1 314 P 00:22:34.610 0.6 M 1.0 SUCR'],
['BRRN EC 0.1 207 P 00:22:34.768 0.4 M 1.0 BRRN']], dtype=object)
Lines can be separated with list(a.values[0])[0].split()
but this will then take reorganising to get individual columns. 行可以用list(a.values[0])[0].split()
分隔,但是这将需要重新组织以获取单个列。 I would like to have pandas.read_csv
just recognise they're separate so I can extract individual columns (being reasonably efficient is going to be important once I scale it up) 我想让pandas.read_csv
认识到它们是分开的,所以我可以提取单个列(一旦扩大规模,合理的效率就很重要)
Where am I going wrong? 我要去哪里错了?
As pointed out by DSM , it is a typo in the delimiter: 正如DSM所指出的,这是分隔符中的一个错字:
\\s+
, not \\+s
\\s+
而不是\\+s
which came from a typo in the documentation , under the delim_whitespace
parameter heading. 它来自文档中delim_whitespace
参数标题下的delim_whitespace
。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.