[英]Escaped quotes in pandas read_csv
使用read_csv
時,我無法創建已轉義引號的數據read_csv
。
(注意:R的read.csv
按預期工作。)
import pandas as pd
pd.read_csv('data.csv')
#error!
CParserError: Error tokenizing data. C error: Expected 2 fields in line 4, saw 3
SEARCH_TERM,ACTUAL_URL
"bra tv bord","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
"tv på hjul","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
"SLAGBORD, \"Bergslagen\", IKEA:s 1700-tals serie","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
如何閱讀此csv並避免此錯誤?
我的猜測是,大熊貓正在使用一些正則表達式,這些表達式無法處理第三行的歧義和行程,或者更具體地說: \\"Bergslagen\\"
。
它確實有效,但你必須指出嵌入式引號的轉義字符:
In [1]: data = '''SEARCH_TERM,ACTUAL_URL
"bra tv bord","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
"tv p\xc3\xa5 hjul","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"
"SLAGBORD, \\"Bergslagen\\", IKEA:s 1700-tals serie","http://www.ikea.com/se/sv/catalog/categories/departments/living_room/10475/?se%7cps%7cnonbranded%7cvardagsrum%7cgoogle%7ctv_bord"'''
In [2]: df = read_csv(StringIO(data), escapechar='\\', encoding='utf-8')
In [3]: df
Out[3]:
SEARCH_TERM ACTUAL_URL
0 bra tv bord http://www.ikea.com/se/sv/catalog/categories/d...
1 tv på hjul http://www.ikea.com/se/sv/catalog/categories/d...
2 SLAGBORD, "Bergslagen", IKEA:s 1700-tals serie http://www.ikea.com/se/sv/catalog/categories/d...
看到這個要點 。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.