[英]My data has comma in the value of the column which is also a delimiter, how to read it by csv.reader in python
This is how my data looks,这就是我的数据的样子,
"id","text"
"416752", "i's ** This year’s theme is \"Each for Equal\".** When: Friday March 6th Time: 10:00am-12:00pm Where: HQ 155 Gordon Baker Road, 5th floor, Halifax/Dartmouth Boardoom Please note there are 2 shifts - one to log your attendance for the event and the other to log your donation, if applicable. Structure of the Event 10h00am - 10h15am: Welcome, cupcakes, coffee, social media picture opportunity (#eachforequal) 10h15am – 10h25am: Intro from WIA, IWD, theme introduction, agenda, community initiative: Dress for Success 10h30am – 11h15am: Keynote Speaker (Shasta Townsend) 11h30am – 12h00pm: Panel Stories (Mike Sharun, Dennis Hoffman, Barbara Dahan and Nikki Matarazzo) **"
Below is the logic i have tried:以下是我尝试过的逻辑:
import csv
import codecs
file_path=r'C:\data.csv'
f = codecs.open(file_path, encoding = "utf8", errors ='replace')
csvreader = csv.reader(f, delimiter=',')
for row in csvreader:
print(row)
Output: Output:
['416752', 'i\'s ** This year’s theme is \\Each for Equal\\".** When: Friday March 6th Time: 10:00am-12:00pm Where: HQ 155 Gordon Baker Road', '5th floor', 'Halifax/Dartmouth Boardoom Please note there are 2 shifts - one to log your attendance for the event and the other to log your donation', 'if applicable. Structure of the Event 10h00am - 10h15am: Welcome', 'cupcakes', 'coffee', 'social media picture opportunity (#eachforequal) 10h15am – 10h25am: Intro from WIA', 'IWD', 'theme introduction', 'agenda', 'community initiative: Dress for Success 10h30am – 11h15am: Keynote Speaker (Shasta Townsend) 11h30am – 12h00pm: Panel Stories (Mike Sharun', 'Dennis Hoffman', 'Barbara Dahan and Nikki Matarazzo) **"']
Instead of two values, i am getting more than that as my data is having many commas.我得到的不是两个值,而是更多,因为我的数据有很多逗号。
The space after the comma is the problem.逗号后的空格是问题所在。 Use
skipinitialspace=True
to remedy this.使用
skipinitialspace=True
来解决这个问题。 Also recommend using built-in open
.还建议使用内置的
open
。 With your updated data, you also need doublequote=False
and escapechar='\\'
.使用更新的数据,您还需要 doublequote
doublequote=False
和escapechar='\\'
。
Double quotes within a string is default-escaped by doubling the character, eg "Say, ""Hi!"""
, but your example uses a backslash to escape, eg "Say, \"Hi!\""
.字符串中的双引号默认通过加倍字符进行转义,例如
"Say, ""Hi!"""
,但您的示例使用反斜杠进行转义,例如"Say, \"Hi!\""
。 The two additional options disable doubling of the quote, and to use the escape character instead.这两个附加选项禁用双引号,而是使用转义字符。
import csv
file_path = 'data.csv'
with open(file_path, encoding='utf8', newline='') as f:
csvreader = csv.reader(f, skipinitialspace=True, doublequote=False, escapechar='\\')
for row in csvreader:
print(row)
Output: Output:
['id', 'text']
['416752', 'i\'s ** This year’s theme is "Each for Equal".** When: Friday March 6th Time: 10:00am-12:00pm Where: HQ 155 Gordon Baker Road, 5th floor, Halifax/Dartmouth Boardoom Please note there are 2 shifts - one to log your attendance for the event and the other to log your donation, if applicable. Structure of the Event 10h00am - 10h15am: Welcome, cupcakes, coffee, social media picture opportunity (#eachforequal) 10h15am – 10h25am: Intro from WIA, IWD, theme introduction, agenda, community initiative: Dress for Success 10h30am – 11h15am: Keynote Speaker (Shasta Townsend) 11h30am – 12h00pm: Panel Stories (Mike Sharun, Dennis Hoffman, Barbara Dahan and Nikki Matarazzo) **']
As is mentioned in this question Why is the Python CSV reader ignoring double-quoted fields?正如这个问题中提到的为什么 Python CSV 阅读器忽略双引号字段? you'll need to add the skipinitalspace param so that csv.reader will understand the quotes.
您需要添加 skipinitalspace 参数,以便 csv.reader 能够理解引号。
import csv
import codecs
file_path=r'C:\data.csv'
f = codecs.open(file_path, encoding = "utf8", errors ='replace')
csvreader = csv.reader(f, delimiter=',', skipinitialspace=True)
for row in csvreader:
print(row)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.