[英]Read file with missing data with loadtxt (numpy)
當我嘗試閱讀以下數據時:
loadtxt('RSTN')
我有一個錯誤。
然后我嘗試使用以下方法完成此缺失數據:
genfromtxt('RSTN',delimiter=' ')
但我收到了這個錯誤:
Line #31112 (got 7 columns instead of 8)
我想用nan
或類似的東西填充缺失的數據。
我在名為RSTN
的 ascii 文件中有這樣的數據:
20120127165126 19 42 54 91 113 147 188 284
20120127165127 19 42 54 91 113 147 188 284
20120127165128 19 42 54 90 113 147 188 284
20120127165129 19 42 54 90 113 147 188 284
20120127165130 19 42 54 88 107 131 155 235
20120127165131 19 42 54 72 79 79 92 154
20120127165132 19 42 54 45 43 42 50 97
20120127165133 19 42 54 24 21 21 25 65
20120127165134 19 42 54 11 8 9 12 46
20120127165135 19 42 54 5 2 3 7 35
20120127165136 18 42 54 2 0 1 4 29
20120127165137 19 42 54 0 0 2 25
20120127165138 19 42 53 0 0 1 22
20120127165139 19 42 54 0 0 1 19
20120127165140 19 42 54 0 0 0 17
20120127165141 19 42 54 0 0 0 14
20120127165142 19 42 54 0 0 0 14
20120127165143 19 42 54 0 0 0 14
20120127165144 19 42 54 0 0 13
20120127165145 19 42 54 0 0 14
20120127165146 19 42 54 0 0 0 14
20120127165147 19 42 54 0 0 1 15
20120127165148 19 42 54 0 0 1 15
20120127165149 19 42 54 0 0 1 15
20120127165150 20 42 53 0 1 15
20120127165151 20 42 53 0 1 17
20120127165152 20 42 53 0 1 17
20120127165153 19 42 53 0 0 1 17
20120127165154 20 42 53 0 1 17
20120127165155 20 42 53 0 1 17
20120127165156 20 42 53 0 0 1 17
20120127165157 19 42 54 0 0 1 17
20120127165158 19 42 55 0 0 1 17
20120127165159 19 42 55 0 0 1 17
20120127165200 20 42 56 0 0 1 17
20120127165201 21 42 56 0 0 1 17
當我這樣做時:
from pandas import *
data=read_fwf('26JAN12.K7O', colspecs='infer', header=None)
我收到此錯誤:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 429, in read_fwf
return _read(filepath_or_buffer, kwds)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 198, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 479, in __init__
self._make_engine(self.engine)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 592, in _make_engine
self._engine = klass(self.f, **self.options)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1954, in __init__
PythonParser.__init__(self, f, **kwds)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1237, in __init__
self._make_reader(f)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1957, in _make_reader
self.data = FixedWidthReader(f, self.colspecs, self.delimiter)
File "C:\Anaconda\lib\site-packages\pandas\io\parsers.py", line 1933, in __init__
raise AssertionError()
AssertionError
如果你有熊貓,你可以用pd.read_fwf
解析它:
import pandas as pd
df = pd.read_fwf('data', colspecs='infer', header=None, parse_dates=[[0]])
print(df)
產量
0 1 2 3 4 5 6 7 8
0 2012-01-27 16:51:26 19 42 54 91 113 147 188 284
1 2012-01-27 16:51:27 19 42 54 91 113 147 188 284
...
11 2012-01-27 16:51:37 19 42 54 0 NaN 0 2 25
12 2012-01-27 16:51:38 19 42 53 0 NaN 0 1 22
13 2012-01-27 16:51:39 19 42 54 0 NaN 0 1 19
[36 rows x 9 columns]
或者,感謝 DSM,使用np.genfromtxt
您可以通過將寬度列表傳遞給delimiter
參數來解析固定寬度的數據:
import numpy as np
np.set_printoptions(formatter={'float':'{:g}'.format})
arr = np.genfromtxt('data', delimiter=[18]+[7]*8)
print(arr)
產量
[[2.01201e+13 19 42 54 91 113 147 188 284]
[2.01201e+13 19 42 54 91 113 147 188 284]
[2.01201e+13 19 42 54 90 113 147 188 284]
...
[2.01201e+13 19 42 54 0 nan 0 2 25]
[2.01201e+13 19 42 53 0 nan 0 1 22]
[2.01201e+13 19 42 54 0 nan 0 1 19]
...]
我有一個類似的問題,從缺少數據的制表符分隔文件中讀取。 如果您可以以制表符分隔格式獲取數據,則可以使用以下方法:
import pandas as pd
df = pd.read_csv('RSTN', sep='\t', header = None)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.