简体   繁体   English

在带有多个双引号的字段内读取带有分隔符的 csv 文件

[英]reading csv file with delimiter inside a field with several double quotes

I have a csv file with , separating the columns that I want to read with pandas, ie df = pd.read_csv("myfile.csv",sep=',',dtype=str,encoding='utf-8') .我有一个 csv 文件,用 分隔我想用 pandas 读取的列,即df = pd.read_csv("myfile.csv",sep=',',dtype=str,encoding='utf-8') Columns are of various types, but I want to read everything as strings.列有多种类型,但我想将所有内容都读取为字符串。 One row of the following form causes the reader to see more columns in that row than it expected:以下形式的一行会导致读者在该行中看到比预期更多的列:

# column 1, column 2, column 3, ...
46745,"\\"\\"\\"blabla\\"\\" a, b bli\\"\\"more bla.\\"\\" bl blu \\"\\"bli bla blub\\"\\"\\"","something else",...

Some of the fields, as the second and third column here, are in double quotes.一些字段,如这里的第二和第三列,是用双引号引起来的。 What sets the second field apart from other double quoted fields is that it contains several quotes such that when the parser reaches the , it does not realize that it is actually still within a quote which it should have recognized because the final quote character should be followed by the delimiter.第二个字段与其他双引号字段的不同之处在于它包含多个引号,因此当解析器到达由分隔符。 Interestingly, when you pass engine='python' the parser actually recognized this, but instead of drawing the conclusion that the quote simply has not finished yet it throws the error ',' expected after '"' .有趣的是,当您传递engine='python'时,解析器实际上认识到了这一点,但是并没有得出引用还没有完成的结论,而是',' expected after '"'

I tried all kinds of combinations for the keyword arguments quoting , quotechar , sep and engine .我为关键字 arguments quotingquotecharsepengine尝试了各种组合。 All to no avail.一切都无济于事。

Edit: Example as requested编辑:根据要求的示例

import io
s = 'column1,column2,column3\n3463,hello,"more, stuff"\n46745,"\\"\\"\\"blabla\\"\\" a, b bli\\"\\"more bla.\\"\\" bl blu \\"\\"bli bla blub\\"\\"\\"","something else"'
df = pd.read_csv(io.StringIO(s),sep=',',dtype=str,encoding='utf-8')

If you comment out the last line, it works.如果您注释掉最后一行,它会起作用。

Usually quotes inside a field are escaped with another quote ( " ) which could be the default for the parser.通常,字段内的引号会用另一个引号 ( " ) 转义,这可能是解析器的默认设置。

You probably need to use escapechar = '\\' in this case.在这种情况下,您可能需要使用escapechar = '\\'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM