![](/img/trans.png)
[英]Pandas.read_csv() Decoding Error tokenizing data because of a comma in data
[英]Pandas.read_csv error tokenizing data
我在Pandas.read_csv上遇到麻煩
我想閱讀此文本文件(請參見下文),當我將這些數據復制到excel>文本到列>以“空格”分隔時,它會為我提供我想要的確切輸出。
我嘗試了很多不同的方法,我認為解決多個空格的regEx可以解決問題,但是我無法使其起作用。
我嘗試以下代碼:
petrelTxt = pd.read_csv(petrelfile, sep = ' ', header = None)
這給了我錯誤
CParserError: Error tokenizing data. C error: Expected 6 fields in line 2, saw 17
當我嘗試更改“ sep ='\\ s +'”時,它會使文件更遠,但仍然無法正常工作。
petrelTxt = pd.read_csv(petrelfile, sep = '\s+', header = None)
CParserError: Error tokenizing data. C error: Expected 5 fields in line 3, saw 6
這是原始的txt文件:
# WELL TRACE FROM PETREL
# WELL NAME: ZZ-0113
# WELL HEAD X-COORDINATE: 9999999.00000000 (m)
# WELL HEAD Y-COORDINATE: 9999999.00000000 (m)
# WELL KB: 159.00000000 (ft)
# WELL TYPE: OIL
# MD AND TVD ARE REFERENCED (=0) AT KB AND INCREASE DOWNWARDS
# ANGLES ARE GIVEN IN DEGREES
# XYZ TRACE IS GIVEN IN COORDINATE SYSTEM WGS_1924_UTM_Zone_42N
# AZIMUTH REFERENCE TRUE NORTH
# DX DY ARE GIVEN IN GRID NORTH IN m-UNITS
# DEPTH (Z, TVD) GIVEN IN ft-UNITS
#======================================================================================================================================
MD X Y Z TVD DX DY AZIM INCL DLS
#======================================================================================================================================
0.0000000000 999999.00000 9999999.0000 159.00000000 0.0000000000 0.0000005192 -0.000000000 1.3487006929 0.0000000000 0.0000000000
132.00000000 999999.08032 9999999.9116 27.000774702 131.99922530 0.0803153923 -0.088388779 139.08870069 0.3400000000 0.2575757504
221.00000000 999999.19115 9999999.8017 -61.99775149 220.99775149 0.1911487882 -0.198290891 132.93870069 0.3200000000 0.0456726104
嘗試comment="#"
使用io
模塊模擬文件的示例
data = '''# WELL TRACE FROM PETREL
# WELL NAME: ZZ-0113
# WELL HEAD X-COORDINATE: 9999999.00000000 (m)
# WELL HEAD Y-COORDINATE: 9999999.00000000 (m)
# WELL KB: 159.00000000 (ft)
# WELL TYPE: OIL
# MD AND TVD ARE REFERENCED (=0) AT KB AND INCREASE DOWNWARDS
# ANGLES ARE GIVEN IN DEGREES
# XYZ TRACE IS GIVEN IN COORDINATE SYSTEM WGS_1924_UTM_Zone_42N
# AZIMUTH REFERENCE TRUE NORTH
# DX DY ARE GIVEN IN GRID NORTH IN m-UNITS
# DEPTH (Z, TVD) GIVEN IN ft-UNITS
#======================================================================================================================================
MD X Y Z TVD DX DY AZIM INCL DLS
#======================================================================================================================================
0.0000000000 999999.00000 9999999.0000 159.00000000 0.0000000000 0.0000005192 -0.000000000 1.3487006929 0.0000000000 0.0000000000
132.00000000 999999.08032 9999999.9116 27.000774702 131.99922530 0.0803153923 -0.088388779 139.08870069 0.3400000000 0.2575757504
221.00000000 999999.19115 9999999.8017 -61.99775149 220.99775149 0.1911487882 -0.198290891 132.93870069 0.3200000000 0.0456726104'''
import pandas as pd
import io
f = io.StringIO(data)
df = pd.read_csv(f, comment="#", sep='\s+')
print(df.columns)
print(df.head())
結果:
Index(['MD', 'X', 'Y', 'Z', 'TVD', 'DX', 'DY', 'AZIM', 'INCL', 'DLS'], dtype='object')
MD X Y Z TVD DX \
0 0.0 999999.00000 9.999999e+06 159.000000 0.000000 5.192000e-07
1 132.0 999999.08032 1.000000e+07 27.000775 131.999225 8.031539e-02
2 221.0 999999.19115 1.000000e+07 -61.997751 220.997751 1.911488e-01
DY AZIM INCL DLS
0 -0.000000 1.348701 0.00 0.000000
1 -0.088389 139.088701 0.34 0.257576
2 -0.198291 132.938701 0.32 0.045673
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.