简体   繁体   中英

Pandas.read_csv error tokenizing data

I am having trouble with Pandas.read_csv

I would like to read this text file (see below) When I take this data and copy it into excel > text to columns > delimited by "Space" it gives me the exact output I am looking for.

I have tried a bunch of different ways, I thought that the regEx to account for multiple spaces would do the trick, but I failed to make it work.

I try this code:

petrelTxt = pd.read_csv(petrelfile, sep = ' ', header = None)

and it gives me the error

CParserError: Error tokenizing data. C error: Expected 6 fields in line 2, saw 17

When I try changing the "sep = '\\s+' " it makes it farther down the file, but still does not work.

petrelTxt = pd.read_csv(petrelfile, sep = '\s+', header = None)


CParserError: Error tokenizing data. C error: Expected 5 fields in line 3, saw 6

This is the original txt file:

# WELL TRACE FROM PETREL 
# WELL NAME:              ZZ-0113
# WELL HEAD X-COORDINATE: 9999999.00000000 (m)
# WELL HEAD Y-COORDINATE: 9999999.00000000 (m)
# WELL KB:                159.00000000 (ft)
# WELL TYPE:              OIL
# MD AND TVD ARE REFERENCED (=0) AT KB AND INCREASE DOWNWARDS
# ANGLES ARE GIVEN IN DEGREES
# XYZ TRACE IS GIVEN IN COORDINATE SYSTEM WGS_1924_UTM_Zone_42N
# AZIMUTH REFERENCE TRUE NORTH
# DX DY ARE GIVEN IN GRID NORTH IN m-UNITS
# DEPTH (Z, TVD) GIVEN IN ft-UNITS
#======================================================================================================================================
      MD              X              Y             Z           TVD           DX           DY          AZIM          INCL          DLS
#======================================================================================================================================
 0.0000000000   999999.00000 9999999.0000 159.00000000 0.0000000000 0.0000005192 -0.000000000 1.3487006929 0.0000000000 0.0000000000
 132.00000000   999999.08032 9999999.9116 27.000774702 131.99922530 0.0803153923 -0.088388779 139.08870069 0.3400000000 0.2575757504
 221.00000000   999999.19115 9999999.8017 -61.99775149 220.99775149 0.1911487882 -0.198290891 132.93870069 0.3200000000 0.0456726104

Try comment="#"

Example using io module to emulate file

data = '''# WELL TRACE FROM PETREL 
# WELL NAME:              ZZ-0113
# WELL HEAD X-COORDINATE: 9999999.00000000 (m)
# WELL HEAD Y-COORDINATE: 9999999.00000000 (m)
# WELL KB:                159.00000000 (ft)
# WELL TYPE:              OIL
# MD AND TVD ARE REFERENCED (=0) AT KB AND INCREASE DOWNWARDS
# ANGLES ARE GIVEN IN DEGREES
# XYZ TRACE IS GIVEN IN COORDINATE SYSTEM WGS_1924_UTM_Zone_42N
# AZIMUTH REFERENCE TRUE NORTH
# DX DY ARE GIVEN IN GRID NORTH IN m-UNITS
# DEPTH (Z, TVD) GIVEN IN ft-UNITS
#======================================================================================================================================
      MD              X              Y             Z           TVD           DX           DY          AZIM          INCL          DLS
#======================================================================================================================================
 0.0000000000   999999.00000 9999999.0000 159.00000000 0.0000000000 0.0000005192 -0.000000000 1.3487006929 0.0000000000 0.0000000000
 132.00000000   999999.08032 9999999.9116 27.000774702 131.99922530 0.0803153923 -0.088388779 139.08870069 0.3400000000 0.2575757504
 221.00000000   999999.19115 9999999.8017 -61.99775149 220.99775149 0.1911487882 -0.198290891 132.93870069 0.3200000000 0.0456726104'''

import pandas as pd
import io

f = io.StringIO(data)

df = pd.read_csv(f, comment="#", sep='\s+')

print(df.columns)
print(df.head())

Result:

Index(['MD', 'X', 'Y', 'Z', 'TVD', 'DX', 'DY', 'AZIM', 'INCL', 'DLS'], dtype='object')

      MD             X             Y           Z         TVD            DX  \
0    0.0  999999.00000  9.999999e+06  159.000000    0.000000  5.192000e-07   
1  132.0  999999.08032  1.000000e+07   27.000775  131.999225  8.031539e-02   
2  221.0  999999.19115  1.000000e+07  -61.997751  220.997751  1.911488e-01   

         DY        AZIM  INCL       DLS  
0 -0.000000    1.348701  0.00  0.000000  
1 -0.088389  139.088701  0.34  0.257576  
2 -0.198291  132.938701  0.32  0.045673  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM