简体   繁体   English

python csv阅读器没有读取所有行

[英]python csv reader not reading all rows

So I've got about 5008 rows in a CSV file, a total of 5009 with the headers. 所以我在CSV文件中有大约5008行,总共有5009个标题。 I'm creating and writing this file all within the same script. 我正在同一个脚本中创建和编写此文件。 But when i read it at the end, with either pandas pd.read_csv, or python3's csv module, and print the len, it outputs 4967. I checked the file for any weird characters that may be confusing python but don't see any. 但是当我最后阅读它时,使用pandas pd.read_csv或python3的csv模块,并打印len,它输出4967.我检查了文件中是否有任何奇怪的字符,可能会让python感到困惑,但看不到任何。 All the data is delimited by commas. 所有数据都以逗号分隔。

I also opened it in sublime and it shows 5009 rows not 4967. 我也在崇高中打开它,它显示5009行而不是4967。

I could try other methods from pandas like merge or concat, but if python wont read the csv correct, that's no use. 我可以尝试使用像merge或concat这样的pandas的其他方法,但是如果python不会读取csv的正确,那就没用了。

This is one method i tried. 这是我试过的一种方法。

df1=pd.read_csv('out.csv',quoting=csv.QUOTE_NONE, error_bad_lines=False)
df2=pd.read_excel(xlsfile)

print (len(df1))#4967
print (len(df2))#5008

df2['Location']=df1['Location']
df2['Sublocation']=df1['Sublocation']
df2['Zone']=df1['Zone']
df2['Subnet Type']=df1['Subnet Type']
df2['Description']=df1['Description']

newfile = input("Enter a name for the combined csv file: ")
print('Saving to new csv file...')
df2.to_csv(newfile, index=False)
print('Done.')

target.close()

Another way I tried is 我尝试的另一种方式是

dfcsv = pd.read_csv('out.csv')

wb = xlrd.open_workbook(xlsfile)
ws = wb.sheet_by_index(0)
xlsdata = []
for rx in range(ws.nrows):
    xlsdata.append(ws.row_values(rx))

print (len(dfcsv))#4967
print (len(xlsdata))#5009

df1 = pd.DataFrame(data=dfcsv)
df2 = pd.DataFrame(data=xlsdata)

df3 = pd.concat([df2,df1], axis=1)

newfile = input("Enter a name for the combined csv file: ")
print('Saving to new csv file...')
df3.to_csv(newfile, index=False)    
print('Done.')

target.close()

But not matter what way I try the CSV file is the actual issue, python is writing it correctly but not reading it correctly. 但无论我尝试CSV文件的方式是实际问题,python正确地写它但没有正确读取它。

Edit: Weirdest part is that i'm getting absolutely no encoding errors or any errors when running the code... 编辑:最奇怪的部分是我在运行代码时绝对没有编码错误或任何错误...

Edit2: Tried testing it with nrows param in first code example, works up to 4000 rows. Edit2:尝试在第一个代码示例中使用nrows param对其进行测试,最多可处理4000行。 Soon as i specify 5000 rows, it reads only 4967. 很快,当我指定5000行时,它只读取4967行。

Edit3: manually saved csv file with my data instead of using the one written by the program, and it read 5008 rows. Edit3:用我的数据手动保存csv文件,而不是使用程序写的那个,它读取5008行。 Why is python not writing the csv file correctly? 为什么python没有正确编写csv文件?

I ran into this issue also. 我也碰到了这个问题。 I realized that some of my lines had open-ended quotes, which was for some reason interfering with the reader. 我意识到我的一些行有开放式引号,这是出于某种原因干扰了读者。

So for example, some rows were written as: 因此,例如,一些行被写为:

GO:0000026  molecular_function  "alpha-1
GO:0000027  biological_process  ribosomal large subunit assembly
GO:0000033  molecular_function  "alpha-1

and this led to rows being read incorrectly. 这导致行被错误地读取。 (Unfortunately I don't know enough about how csvreader works to tell you why. Hopefully someone can clarify the quote behavior!) (不幸的是,我不太了解csvreader如何工作告诉你原因。希望有人可以澄清引用行为!)

I just removed the quotes and it worked out. 我刚刚删除了引号,它就解决了。

Edited: This option works too, if you want to maintain the quotes: 已编辑:如果要维护引号,此选项也有效:

quotechar=None

My best guess without seeing the file is that you have some lines with too many or not enough commas, maybe due to values like foo,bar . 我最好的猜测是没有看到文件是你有一些行太多或没有足够的逗号,可能是由于像foo,bar这样的值。

Please try setting error_bad_lines=True . 请尝试设置error_bad_lines=True From Pandas documentation: http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html to see if it catches lines with errors in them, and my guess is that there will be 41 such lines. 从Pandas文档: http//pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html看它是否捕获了包含错误的行,我的猜测是会有41行这样的行。

error_bad_lines : boolean, default True Lines with too many fields (eg a csv line with too many commas) will by default cause an exception to be raised, and no DataFrame will be returned. error_bad_lines:boolean,default具有太多字段的True Lines(例如,带有太多逗号的csv行)默认情况下会引发异常,并且不会返回任何DataFrame。 If False, then these “bad lines” will dropped from the DataFrame that is returned. 如果为False,那么这些“坏行”将从返回的DataFrame中删除。 (Only valid with C parser) (仅对C解析器有效)

The csv.QUOTE_NONE option seems to not quote fields and replace the current delimiter with escape_char + delimiter when writing, but you didn't paste your writing code, but on read it's unclear what this option does. csv.QUOTE_NONE选项似乎不引用字段并在写入时用escape_char + delimiter替换当前分隔符,但是您没有粘贴您的编写代码,但在读取时它不清楚此选项的作用。 https://docs.python.org/3/library/csv.html#csv.Dialect https://docs.python.org/3/library/csv.html#csv.Dialect

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM