简体   繁体   中英

Pandas: error when reading CSV file using `sep` and `comment` arguments

Situation

I have to create a pandas dataframe from a CSV-like file that has the following characteristics:

  • The delimiter used by the file can be either comma or space, and I don't know in advance which one the file will have.
  • At the top of the file there can be one or more comment lines, which start with # .

Problem

I have tried to tackle this with the pd.read_csv method with arguments sep=None and comment='#' . To my understanding the sep=None argument tells pandas to auto-detect the delimiter character and the comment='#' argument tells pandas that all lines starting with # are comment lines that should be ignored.

These arguments work fine when used individually. However when I use them both together, then I receive the error message TypeError: expected string or bytes-like object . The following code example demonstrates this:

from io import StringIO
import pandas as pd

# Simulated data file contents
tabular_data = (
    '# Data generated on 04 May 2017\n'
    'col1,col2,col3\n'
    '5.9,7.8,3.2\n'
    '7.1,0.4,8.1\n'
    '9.4,5.4,1.9\n'
)

# This works
df1 = pd.read_csv(StringIO(tabular_data), sep=None)
print(df1)

# This also works
df2 = pd.read_csv(StringIO(tabular_data), comment='#')
print(df2)

# This will give an error
df3 = pd.read_csv(StringIO(tabular_data), sep=None, comment='#')
print(df3)

Unfortunately I don't really understand what is triggering the error. Would anyone here be able to give me some help to resolve this problem?

Try this:

In [186]: df = pd.read_csv(StringIO(tabular_data), sep=r'(?:,|\s+)',
                           comment='#', engine='python')

In [187]: df
Out[187]:
   col1  col2  col3
0   5.9   7.8   3.2
1   7.1   0.4   8.1
2   9.4   5.4   1.9

'(?:,|\\s+)' - is a RegEx for selecting either comma or any number of consecutive spaces/tabs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM