[英]Pandas error tokenizing data when field in csv file contains quotation mark
I'm using pandas.read_csv
to read a tab delimited file and am running into the error: Error tokenizing data. 我正在使用
pandas.read_csv
读取制表符分隔的文件,并pandas.read_csv
以下错误:标记数据出错。 C error: Expected 364 fields in line 73058, saw 398 C错误:在73058行中预期有364个字段,看到398
After much searching, it seems that the offending entry is: "– SO ,쳌 \\\\ ?Œ ø ,d -L ,ú ,‚ ZO
经过大量搜索之后,似乎令人反感的条目是:
"– SO ,쳌 \\\\ ?Œ ø ,d -L ,ú ,‚ ZO
Removing the quotation mark seems to solve things. 删除引号似乎可以解决问题。 I've got a lot of large files with a lot of strange characters in them, so this will no doubt repeat itself.
我有很多大文件,里面有很多奇怪的字符,所以毫无疑问,这会重复一遍。 Do I need to remove single quotation marks ahead of time or is there some way around this?
我是否需要提前删除单引号,或者是否可以解决此问题?
There is a quoting argument for read_csv
: read_csv
有一个引号参数:
quoting : int or csv.QUOTE_* instance, default None
Control field quoting behavior per ``csv.QUOTE_*`` constants. Use one of
QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3).
Default (None) results in QUOTE_MINIMAL behavior.
These are described in the csv docs . 这些在csv docs中进行了描述 。
Try setting quoting=3
(ie QUOTE_NONE
). 尝试设置
quoting=3
(即QUOTE_NONE
)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.