Pandas read_csv adds unnecessary “ ” to each row

Question

I have a csv file

(I am showing the first three rows here)

HEIGHT,WEIGHT,AGE,GENDER,SMOKES,ALCOHOL,EXERCISE,TRT,PULSE1,PULSE2,YEAR
173,57,18,2,2,1,2,2,86,88,93
179,58,19,2,2,1,2,1,82,150,93

I am using pandas read_csv to read the file and put them into columns.

Here is my code:

import pandas as pd
import os
path='~/Desktop/pulse.csv'

path=os.path.expanduser(path)
my_data=pd.read_csv(path, index_col=False, header=None, quoting = 3, delimiter=',')
print my_data

The problem is the first and last columns have " before and after the values.

Additionally I can't get rid of the indexes.

It might be making some silly mistake but I thank you for your help in advance

Answer 1

Final solution - use replace with converting to int s and for remove " from columns names use strip :

df = pd.read_csv('pulse.csv', quoting=3)

df = df.replace('"','', regex=True).astype(int)
df.columns = df.columns.str.strip('"')
print (df.head())

   HEIGHT  WEIGHT  AGE  GENDER  SMOKES  ALCOHOL  EXERCISE  TRT  PULSE1  \
0     173      57   18       2       2        1         2    2      86   
1     179      58   19       2       2        1         2    1      82   
2     167      62   18       2       2        1         1    1      96   
3     195      84   18       1       2        1         1    2      71   
4     173      64   18       2       2        1         3    2      90   

   PULSE2  YEAR  
0      88    93  
1     150    93  
2     176    93  
3      73    93  
4      88    93

index_col=False means force not read first column to index, but dataframe always need some index, so is added default - 0,1,2... . So here can be omit.

header=None should be removed because it force dont read first row (header of csv) to columns of DataFrame . Then also first row of data is header and numeric values are converted to strings.

delimiter=',' should be removed too, because it is same as sep=',' what is default parameter.

Answer 2

@jezrael is right - a pandas dataframe will always add indices. It's necessary.

try something like df[0] = df[0].str.strip() replacing zero with the last column.

before you do so, convert your csv to a dataframe - pd.DataFrame.from_csv(path)

Pandas read_csv adds unnecessary “ ” to each row

Question

2 answers

solution1
2 ACCPTED 2017-09-19 05:16:39

solution2
0 2017-09-19 05:08:53

Pandas read_csv adds unnecessary “ ” to each row

Question

2 answers

solution1 2 ACCPTED 2017-09-19 05:16:39

solution2 0 2017-09-19 05:08:53

solution1
2 ACCPTED 2017-09-19 05:16:39

solution2
0 2017-09-19 05:08:53