繁体   English   中英

在pandas read_csv中转义引号

[英]Escaped quotes in pandas read_csv

使用read_csv时,我无法创建已转义引号的数据read_csv
(注意:R的read.csv按预期工作。)

我的代码:

import pandas as pd
pd.read_csv('data.csv')
#error!
CParserError: Error tokenizing data. C error: Expected 2 fields in line 4, saw 3

data.csv

SEARCH_TERM,ACTUAL_URL
"bra tv bord","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
"tv på hjul","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
"SLAGBORD, \"Bergslagen\", IKEA:s 1700-tals serie","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"

如何阅读此csv并避免此错误?

我的猜测是,大熊猫正在使用一些正则表达式,这些表达式无法处理第三行的歧义和行程,或者更具体地说: \\"Bergslagen\\"

它确实有效,但你必须指出嵌入式引号的转义字符:

In [1]: data = '''SEARCH_TERM,ACTUAL_URL
"bra tv bord","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
"tv p\xc3\xa5 hjul","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
"SLAGBORD, \\"Bergslagen\\", IKEA:s 1700-tals serie","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"'''

In [2]: df = read_csv(StringIO(data), escapechar='\\', encoding='utf-8')

In [3]: df
Out[3]: 
                                      SEARCH_TERM                                         ACTUAL_URL
0                                     bra tv bord  http://www.ikea.com/se/sv/catalog/categories/d...
1                                      tv på hjul  http://www.ikea.com/se/sv/catalog/categories/d...
2  SLAGBORD, "Bergslagen", IKEA:s 1700-tals serie  http://www.ikea.com/se/sv/catalog/categories/d...

看到这个要点

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM