I have to create a pandas dataframe from a CSV-like file that has the following characteristics:
#
. I have tried to tackle this with the pd.read_csv
method with arguments sep=None
and comment='#'
. To my understanding the sep=None
argument tells pandas to auto-detect the delimiter character and the comment='#'
argument tells pandas that all lines starting with #
are comment lines that should be ignored.
These arguments work fine when used individually. However when I use them both together, then I receive the error message TypeError: expected string or bytes-like object
. The following code example demonstrates this:
from io import StringIO
import pandas as pd
# Simulated data file contents
tabular_data = (
'# Data generated on 04 May 2017\n'
'col1,col2,col3\n'
'5.9,7.8,3.2\n'
'7.1,0.4,8.1\n'
'9.4,5.4,1.9\n'
)
# This works
df1 = pd.read_csv(StringIO(tabular_data), sep=None)
print(df1)
# This also works
df2 = pd.read_csv(StringIO(tabular_data), comment='#')
print(df2)
# This will give an error
df3 = pd.read_csv(StringIO(tabular_data), sep=None, comment='#')
print(df3)
Unfortunately I don't really understand what is triggering the error. Would anyone here be able to give me some help to resolve this problem?
Try this:
In [186]: df = pd.read_csv(StringIO(tabular_data), sep=r'(?:,|\s+)',
comment='#', engine='python')
In [187]: df
Out[187]:
col1 col2 col3
0 5.9 7.8 3.2
1 7.1 0.4 8.1
2 9.4 5.4 1.9
'(?:,|\\s+)'
- is a RegEx for selecting either comma or any number of consecutive spaces/tabs
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.