python csv阅读器没有读取所有行

Question

So I've got about 5008 rows in a CSV file, a total of 5009 with the headers. 所以我在CSV文件中有大约5008行，总共有5009个标题。 I'm creating and writing this file all within the same script. 我正在同一个脚本中创建和编写此文件。 But when i read it at the end, with either pandas pd.read_csv, or python3's csv module, and print the len, it outputs 4967. I checked the file for any weird characters that may be confusing python but don't see any. 但是当我最后阅读它时，使用pandas pd.read_csv或python3的csv模块，并打印len，它输出4967.我检查了文件中是否有任何奇怪的字符，可能会让python感到困惑，但看不到任何。 All the data is delimited by commas. 所有数据都以逗号分隔。

I also opened it in sublime and it shows 5009 rows not 4967. 我也在崇高中打开它，它显示5009行而不是4967。

I could try other methods from pandas like merge or concat, but if python wont read the csv correct, that's no use. 我可以尝试使用像merge或concat这样的pandas的其他方法，但是如果python不会读取csv的正确，那就没用了。

This is one method i tried. 这是我试过的一种方法。

df1=pd.read_csv('out.csv',quoting=csv.QUOTE_NONE, error_bad_lines=False)
df2=pd.read_excel(xlsfile)

print (len(df1))#4967
print (len(df2))#5008

df2['Location']=df1['Location']
df2['Sublocation']=df1['Sublocation']
df2['Zone']=df1['Zone']
df2['Subnet Type']=df1['Subnet Type']
df2['Description']=df1['Description']

newfile = input("Enter a name for the combined csv file: ")
print('Saving to new csv file...')
df2.to_csv(newfile, index=False)
print('Done.')

target.close()

Another way I tried is 我尝试的另一种方式是

dfcsv = pd.read_csv('out.csv')

wb = xlrd.open_workbook(xlsfile)
ws = wb.sheet_by_index(0)
xlsdata = []
for rx in range(ws.nrows):
    xlsdata.append(ws.row_values(rx))

print (len(dfcsv))#4967
print (len(xlsdata))#5009

df1 = pd.DataFrame(data=dfcsv)
df2 = pd.DataFrame(data=xlsdata)

df3 = pd.concat([df2,df1], axis=1)

newfile = input("Enter a name for the combined csv file: ")
print('Saving to new csv file...')
df3.to_csv(newfile, index=False)    
print('Done.')

target.close()

But not matter what way I try the CSV file is the actual issue, python is writing it correctly but not reading it correctly. 但无论我尝试CSV文件的方式是实际问题，python正确地写它但没有正确读取它。

Edit: Weirdest part is that i'm getting absolutely no encoding errors or any errors when running the code... 编辑：最奇怪的部分是我在运行代码时绝对没有编码错误或任何错误...

Edit2: Tried testing it with nrows param in first code example, works up to 4000 rows. Edit2：尝试在第一个代码示例中使用nrows param对其进行测试，最多可处理4000行。 Soon as i specify 5000 rows, it reads only 4967. 很快，当我指定5000行时，它只读取4967行。

Edit3: manually saved csv file with my data instead of using the one written by the program, and it read 5008 rows. Edit3：用我的数据手动保存csv文件，而不是使用程序写的那个，它读取5008行。 Why is python not writing the csv file correctly? 为什么python没有正确编写csv文件？

Answer 1

I ran into this issue also. 我也碰到了这个问题。 I realized that some of my lines had open-ended quotes, which was for some reason interfering with the reader. 我意识到我的一些行有开放式引号，这是出于某种原因干扰了读者。

So for example, some rows were written as: 因此，例如，一些行被写为：

GO:0000026  molecular_function  "alpha-1
GO:0000027  biological_process  ribosomal large subunit assembly
GO:0000033  molecular_function  "alpha-1

and this led to rows being read incorrectly. 这导致行被错误地读取。 (Unfortunately I don't know enough about how csvreader works to tell you why. Hopefully someone can clarify the quote behavior!) （不幸的是，我不太了解csvreader如何工作告诉你原因。希望有人可以澄清引用行为！）

I just removed the quotes and it worked out. 我刚刚删除了引号，它就解决了。

Edited: This option works too, if you want to maintain the quotes: 已编辑：如果要维护引号，此选项也有效：

quotechar=None

Answer 2

My best guess without seeing the file is that you have some lines with too many or not enough commas, maybe due to values like foo,bar . 我最好的猜测是没有看到文件是你有一些行太多或没有足够的逗号，可能是由于像foo,bar这样的值。

Please try setting error_bad_lines=True . 请尝试设置error_bad_lines=True 。 From Pandas documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html to see if it catches lines with errors in them, and my guess is that there will be 41 such lines. 从Pandas文档： http ： //pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html看它是否捕获了包含错误的行，我的猜测是会有41行这样的行。

error_bad_lines : boolean, default True Lines with too many fields (eg a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. error_bad_lines：boolean，default具有太多字段的True Lines（例如，带有太多逗号的csv行）默认情况下会引发异常，并且不会返回任何DataFrame。 If False, then these “bad lines” will dropped from the DataFrame that is returned. 如果为False，那么这些“坏行”将从返回的DataFrame中删除。 (Only valid with C parser) （仅对C解析器有效）

The csv.QUOTE_NONE option seems to not quote fields and replace the current delimiter with escape_char + delimiter when writing, but you didn't paste your writing code, but on read it's unclear what this option does. csv.QUOTE_NONE选项似乎不引用字段并在写入时用escape_char + delimiter替换当前分隔符，但是您没有粘贴您的编写代码，但在读取时它不清楚此选项的作用。 https://docs.python.org/3/library/csv.html#csv.Dialect https://docs.python.org/3/library/csv.html#csv.Dialect

python csv阅读器没有读取所有行

问题描述

2 个解决方案

解决方案1
2 2018-03-29 17:59:17

解决方案2
0 2016-08-09 15:13:25

python csv阅读器没有读取所有行

问题描述

2 个解决方案

解决方案1 2 2018-03-29 17:59:17

解决方案2 0 2016-08-09 15:13:25

解决方案1
2 2018-03-29 17:59:17

解决方案2
0 2016-08-09 15:13:25