difference in csv.reader and pandas - python

Question

I am importing a csv file using csv.reader and pandas. However, the number of rows from the same file are different.

reviews = []
openfile = open("reviews.csv", 'rb')
r = csv.reader(openfile)
for i in r:
    reviews.append(i)
openfile.close()
print len(reviews)

the results is 10,000 (which is the correct value). However, pandas returns a different value.

df = pd.read_csv("reviews.csv", header=None)
df.info()

this returns 9,985

Does anyone know why there is difference between the two methods of importing data?

I just tried this:

reviews_df = pd.DataFrame(reviews)
reviews_df.info()

This returns 10,000.

Answer 1

Refer to the pandas.read_csv there is an argument named skip_blank_lines and its default value is True hence unless you are setting it to False it will not read the blank lines.

Consider the following example, there are two blank rows:

 A,B,C,D 0.07,-0.71,1.42,-0.37 0.08,0.36,0.99,0.11 1.06,1.55,-0.93,-0.90 -0.33,0.13,-0.11,0.89 1.91,-0.74,0.69,0.83 -0.28,0.14,1.28,-0.40 0.35,1.75,-1.10,1.23 -0.09,0.32,0.91,-0.08

Read it with skip_blank_lines=False:

 df = pd.read_csv('test_data.csv', skip_blank_lines=False) len(df) 10

Read it with skip_blank_lines=True:

  df = pd.read_csv('test_data.csv', skip_blank_lines=True) len(df) 8

difference in csv.reader and pandas - python

Question

1 answers

solution1
5 ACCPTED 2016-04-29 08:24:51

difference in csv.reader and pandas - python

Question

1 answers

solution1 5 ACCPTED 2016-04-29 08:24:51

solution1
5 ACCPTED 2016-04-29 08:24:51