简体   繁体   中英

How to set a space as separator for to_csv()? - python

I need to save a dataframe as a csv knowing that I need to read that csv file with a delim_whitespace=True option in another script?

Here's an example of what I am trying to do -> The dataframe df I'm working with is the following:

    id  var1        var2
0   0   0.000000    0.000000
1   1   0.000000    0.000000
2   2   0.000000    0.000000

I want to save it with a delim_whitespace as a delimiter so I tried:

df.to_csv('df_file.csv', delim_whitespace=True) #does not work
df.to_csv('df_file.csv', sep=r"\s+")            #cannot be opened with a pd.read_csv('df_file.csv', delim_whitespace=True)
df.to_csv('df_file.csv', sep='\t')              #cannot be opened with a pd.read_csv('df_file.csv', delim_whitespace=True)
df.to_csv('df_file.csv', sep=' ')               #cannot be opened with a pd.read_csv('df_file.csv', delim_whitespace=True)
df.to_csv('df_file.csv', sep='    ')            #cannot be saved because sep needs one character apparently

What separator can I use so I can then read that file with the delim_whitespace=True option?

Here is a full save/read example:

Sample data:

import pandas as pd
d = {'id': {0: 0, 1: 1, 2: 2}, 'var1': {0: 0.0, 1: 0.0, 2: 0.0}, 'var2': {0: 0.0, 1: 0.0, 2: 0.0}}
df_save = pd.DataFrame(data=d)

Code:

index=False otherwise after loading the index will be added as another separate column.

p = r'C:\test.csv'
df_save.to_csv(p, sep=' ', index=False)
df_read = pd.read_csv(p, sep=' ')

Output:

   id  var1  var2
0   0   0.0   0.0
1   1   0.0   0.0
2   2   0.0   0.0

If you expirience the error: ParserError: Error tokenizing data. C error: Expected 66 fields in line 16080, saw 67 ParserError: Error tokenizing data. C error: Expected 66 fields in line 16080, saw 67

This means you have at least in that line one more whitespace than there should be. You can now either inspect the file with some reader, eg Pycharm or even Excel and clean that line.

Or you can simply skip bad lines like this:

df = pd.read_csv('df_file.csv', error_bad_lines=False)

Try using:

df.to_csv("output.csv",sep=' ')

to save the file.

To read the file:

df=pd.read_csv("output.csv",sep=' ')

You will get 'Unnamed: 0' as a column name, To deal with that just run:

df.drop(columns=['Unnamed: 0'],inplace=True)

First of all you can't use delim_whitespace in to_csv. Check the documnetation


entries=[[0,1,2],[0.,0.,0.],[0.,0.,0.]]
df=pd.DataFrame(dictt,columns=['id', 'var1', 'var2'])
df
Output:
    id  var1    var2
0   0.0     1.0     2.0
1   0.0     0.0     0.0
2   0.0     0.0     0.0

Save using sep=' ' and checking the resulting file with cat.

df.to_csv('temp.csv',sep=' ')
!cat tt.csv
Output:
 id var1 var2
0 0 0.0 0.0
1 1 0.0 0.0
2 2 0.0 0.0

Can read it then using the delim_whitespace=True

pd.read_csv('temp.csv',delim_whitespace=True)
Output:
    id  var1    var2
0   0.0     1.0     2.0
1   0.0     0.0     0.0
2   0.0     0.0     0.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM