简体   繁体   中英

Can pandas read_csv parse space delimited data with quotes?

I have a text file formatted and I can't figure out how to get read_csv in pandas to correctly read it. The regex expression works directly but not in pandas.read_csv.

By default, I think this should work with the default quoting=0 and without regex

import pandas as pd
from io import StringIO

s = "  \"Random Text\"  1234.00  5678.00  9876.00 1   Z5     2   0   1   1.500   35.3   1.00  389 0.096000  10.00  15000.0  0.102  0.199  0.040  1    0       0    2900             N/A     N/A          N/A\n"
print(s)

df = pd.read_csv(StringIO(s), engine='python', header=None, delim_whitespace=True, quoting=0)
display(df)

but this produces "Random and Text" in seperate columns

熊猫输出

Attempt 2 with regex:

sep_regex = '\s+(?=([^\"]*\"[^\"]*\")*[^\"]*$)' # regex to find spaces except within quotes
df = pd.read_csv(StringIO(s), header=None, sep=sep_regex, engine='python', warn_bad_lines=True)
display(df)

This correctly keeps the quoted text togther but puts NaN between each column. 熊猫输出2

This should work:

df = pd.read_csv(StringIO(s), header=None, sep=r'\s+', quotechar='"')
print(df)

            0       1       2       3   4   5   6   7   8    9     10   11   12     13    14       15     16     17    18  19  20  21    22  23  24  25
0  Random Text  1234.0  5678.0  9876.0   1  Z5   2   0   1  1.5  35.3  1.0  389  0.096  10.0  15000.0  0.102  0.199  0.04   1   0   0  2900 NaN NaN NaN

This worked for me:

df = pd.read_csv(StringIO(s), sep=None, engine='python', 
header=None, quoting=0, skipinitialspace=True)

Output:

            0       1       2       3   4   5   6   7   8    9   ...     16     17    18  19  20  21    22  23  24  25
0  Random Text  1234.0  5678.0  9876.0   1  Z5   2   0   1  1.5  ...  0.102  0.199  0.04   1   0   0  2900 NaN NaN NaN

[1 rows x 26 columns]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM