I need to save a dataframe as a csv knowing that I need to read that csv file with a delim_whitespace=True
option in another script?
Here's an example of what I am trying to do -> The dataframe df
I'm working with is the following:
id var1 var2
0 0 0.000000 0.000000
1 1 0.000000 0.000000
2 2 0.000000 0.000000
I want to save it with a delim_whitespace
as a delimiter so I tried:
df.to_csv('df_file.csv', delim_whitespace=True) #does not work
df.to_csv('df_file.csv', sep=r"\s+") #cannot be opened with a pd.read_csv('df_file.csv', delim_whitespace=True)
df.to_csv('df_file.csv', sep='\t') #cannot be opened with a pd.read_csv('df_file.csv', delim_whitespace=True)
df.to_csv('df_file.csv', sep=' ') #cannot be opened with a pd.read_csv('df_file.csv', delim_whitespace=True)
df.to_csv('df_file.csv', sep=' ') #cannot be saved because sep needs one character apparently
What separator can I use so I can then read that file with the delim_whitespace=True
option?
Here is a full save/read example:
Sample data:
import pandas as pd
d = {'id': {0: 0, 1: 1, 2: 2}, 'var1': {0: 0.0, 1: 0.0, 2: 0.0}, 'var2': {0: 0.0, 1: 0.0, 2: 0.0}}
df_save = pd.DataFrame(data=d)
Code:
index=False
otherwise after loading the index will be added as another separate column.
p = r'C:\test.csv'
df_save.to_csv(p, sep=' ', index=False)
df_read = pd.read_csv(p, sep=' ')
Output:
id var1 var2
0 0 0.0 0.0
1 1 0.0 0.0
2 2 0.0 0.0
If you expirience the error: ParserError: Error tokenizing data. C error: Expected 66 fields in line 16080, saw 67
ParserError: Error tokenizing data. C error: Expected 66 fields in line 16080, saw 67
This means you have at least in that line one more whitespace than there should be. You can now either inspect the file with some reader, eg Pycharm or even Excel and clean that line.
Or you can simply skip bad lines like this:
df = pd.read_csv('df_file.csv', error_bad_lines=False)
Try using:
df.to_csv("output.csv",sep=' ')
to save the file.
To read the file:
df=pd.read_csv("output.csv",sep=' ')
You will get 'Unnamed: 0' as a column name, To deal with that just run:
df.drop(columns=['Unnamed: 0'],inplace=True)
First of all you can't use delim_whitespace in to_csv. Check the documnetation
entries=[[0,1,2],[0.,0.,0.],[0.,0.,0.]]
df=pd.DataFrame(dictt,columns=['id', 'var1', 'var2'])
df
Output:
id var1 var2
0 0.0 1.0 2.0
1 0.0 0.0 0.0
2 0.0 0.0 0.0
Save using sep=' '
and checking the resulting file with cat.
df.to_csv('temp.csv',sep=' ')
!cat tt.csv
Output:
id var1 var2
0 0 0.0 0.0
1 1 0.0 0.0
2 2 0.0 0.0
Can read it then using the delim_whitespace=True
pd.read_csv('temp.csv',delim_whitespace=True)
Output:
id var1 var2
0 0.0 1.0 2.0
1 0.0 0.0 0.0
2 0.0 0.0 0.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.