csv文件中的字段包含引号时，Pandas错误标记数据

Question

I'm using pandas.read_csv to read a tab delimited file and am running into the error: Error tokenizing data. 我正在使用pandas.read_csv读取制表符分隔的文件，并pandas.read_csv以下错误：标记数据出错。 C error: Expected 364 fields in line 73058, saw 398 C错误：在73058行中预期有364个字段，看到398

After much searching, it seems that the offending entry is: "– SO ,쳌 \\\\ ?Œ ø ,d -L ,ú ,‚ ZO 经过大量搜索之后，似乎令人反感的条目是： "– SO ,쳌 \\\\ ?Œ ø ,d -L ,ú ,‚ ZO

Removing the quotation mark seems to solve things. 删除引号似乎可以解决问题。 I've got a lot of large files with a lot of strange characters in them, so this will no doubt repeat itself. 我有很多大文件，里面有很多奇怪的字符，所以毫无疑问，这会重复一遍。 Do I need to remove single quotation marks ahead of time or is there some way around this? 我是否需要提前删除单引号，或者是否可以解决此问题？

Answer 1

There is a quoting argument for read_csv : read_csv有一个引号参数：

quoting : int or csv.QUOTE_* instance, default None
    Control field quoting behavior per ``csv.QUOTE_*`` constants. Use one of
    QUOTE_MINIMAL (0), QUOTE_ALL (1), QUOTE_NONNUMERIC (2) or QUOTE_NONE (3).
    Default (None) results in QUOTE_MINIMAL behavior.

These are described in the csv docs . 这些在csv docs中进行了描述。

Try setting quoting=3 (ie QUOTE_NONE ). 尝试设置quoting=3 （即QUOTE_NONE ）。

csv文件中的字段包含引号时，Pandas错误标记数据

问题描述

1 个解决方案

解决方案1
4 已采纳 2014-02-06 00:59:50

csv文件中的字段包含引号时，Pandas错误标记数据

问题描述

1 个解决方案

解决方案1 4 已采纳 2014-02-06 00:59:50

解决方案1
4 已采纳 2014-02-06 00:59:50