csv.reader和pandas的区别-python

Question

I am importing a csv file using csv.reader and pandas. 我正在使用csv.reader和pandas导入一个csv文件。 However, the number of rows from the same file are different. 但是，来自同一文件的行数是不同的。

reviews = []
openfile = open("reviews.csv", 'rb')
r = csv.reader(openfile)
for i in r:
    reviews.append(i)
openfile.close()
print len(reviews)

the results is 10,000 (which is the correct value). 结果是10,000（这是正确的值）。 However, pandas returns a different value. 但是，熊猫返回不同的值。

df = pd.read_csv("reviews.csv", header=None)
df.info()

this returns 9,985 这将返回9,985

Does anyone know why there is difference between the two methods of importing data? 有谁知道为什么两种导入数据方法之间有区别？

I just tried this: 我只是试过这个：

reviews_df = pd.DataFrame(reviews)
reviews_df.info()

This returns 10,000. 这将返回10,000。

Answer 1

Refer to the pandas.read_csv there is an argument named skip_blank_lines and its default value is True hence unless you are setting it to False it will not read the blank lines. 参考pandas.read_csv有一个名为skip_blank_lines的参数，其默认值为True因此，除非将其设置为False否则它将不会读取空白行。

Consider the following example, there are two blank rows: 考虑下面的示例，有两个空白行：
 A,B,C,D 0.07,-0.71,1.42,-0.37 0.08,0.36,0.99,0.11 1.06,1.55,-0.93,-0.90 -0.33,0.13,-0.11,0.89 1.91,-0.74,0.69,0.83 -0.28,0.14,1.28,-0.40 0.35,1.75,-1.10,1.23 -0.09,0.32,0.91,-0.08 
Read it with skip_blank_lines=False: 使用skip_blank_lines = False读取它：
 df = pd.read_csv('test_data.csv', skip_blank_lines=False) len(df) 10 
Read it with skip_blank_lines=True: 使用skip_blank_lines = True读取它：
  df = pd.read_csv('test_data.csv', skip_blank_lines=True) len(df) 8 

csv.reader和pandas的区别-python

问题描述

1 个解决方案

解决方案1
5 已采纳 2016-04-29 08:24:51

csv.reader和pandas的区别-python

问题描述

1 个解决方案

解决方案1 5 已采纳 2016-04-29 08:24:51

解决方案1
5 已采纳 2016-04-29 08:24:51