簡體   English   中英

我的數據在列的值中有逗號,這也是一個分隔符,如何通過 csv.reader 在 python 中讀取它

[英]My data has comma in the value of the column which is also a delimiter, how to read it by csv.reader in python

這就是我的數據的樣子,

"id","text"
"416752", "i's ** This year’s theme is \"Each for Equal\".**  When: Friday March 6th Time: 10:00am-12:00pm  Where:  HQ 155 Gordon Baker Road, 5th floor, Halifax/Dartmouth Boardoom   Please note there are 2 shifts - one to log your attendance for the event and the other to log your donation, if applicable.    Structure of the Event  10h00am - 10h15am:   Welcome, cupcakes, coffee, social media picture opportunity (#eachforequal)  10h15am – 10h25am:   Intro from WIA, IWD, theme introduction, agenda, community initiative: Dress for Success  10h30am – 11h15am:   Keynote Speaker (Shasta Townsend)  11h30am – 12h00pm:   Panel Stories (Mike Sharun, Dennis Hoffman, Barbara Dahan and Nikki Matarazzo)     **"

以下是我嘗試過的邏輯:

import csv
import codecs
file_path=r'C:\data.csv'
f = codecs.open(file_path, encoding = "utf8", errors ='replace')
csvreader = csv.reader(f, delimiter=',')
for row in csvreader:
    print(row)

Output:

['416752', 'i\'s ** This year’s theme is \\Each for Equal\\".**  When: Friday March 6th Time: 10:00am-12:00pm  Where:  HQ 155 Gordon Baker Road', '5th floor', 'Halifax/Dartmouth Boardoom   Please note there are 2 shifts - one to log your attendance for the event and the other to log your donation', 'if applicable.     Structure of the Event    10h00am - 10h15am:   Welcome', 'cupcakes', 'coffee', 'social media picture opportunity (#eachforequal)  10h15am – 10h25am:   Intro from WIA', 'IWD', 'theme introduction', 'agenda', 'community initiative: Dress for Success  10h30am – 11h15am:   Keynote Speaker (Shasta Townsend)  11h30am – 12h00pm:   Panel Stories (Mike Sharun', 'Dennis Hoffman', 'Barbara Dahan and Nikki Matarazzo)     **"']

我得到的不是兩個值,而是更多,因為我的數據有很多逗號。

逗號后的空格是問題所在。 使用skipinitialspace=True來解決這個問題。 還建議使用內置的open 使用更新的數據,您還需要 doublequote doublequote=Falseescapechar='\\'

字符串中的雙引號默認通過加倍字符進行轉義,例如"Say, ""Hi!""" ,但您的示例使用反斜杠進行轉義,例如"Say, \"Hi!\"" 這兩個附加選項禁用雙引號,而是使用轉義字符。

import csv

file_path = 'data.csv'
with open(file_path, encoding='utf8', newline='') as f:
    csvreader = csv.reader(f, skipinitialspace=True, doublequote=False, escapechar='\\')
    for row in csvreader:
        print(row)

Output:

['id', 'text']
['416752', 'i\'s ** This year’s theme is "Each for Equal".**  When: Friday March 6th Time: 10:00am-12:00pm  Where:  HQ 155 Gordon Baker Road, 5th floor, Halifax/Dartmouth Boardoom   Please note there are 2 shifts - one to log your attendance for the event and the other to log your donation, if applicable.    Structure of the Event  10h00am - 10h15am:   Welcome, cupcakes, coffee, social media picture opportunity (#eachforequal)  10h15am – 10h25am:   Intro from WIA, IWD, theme introduction, agenda, community initiative: Dress for Success  10h30am – 11h15am:   Keynote Speaker (Shasta Townsend)  11h30am – 12h00pm:   Panel Stories (Mike Sharun, Dennis Hoffman, Barbara Dahan and Nikki Matarazzo)     **']

正如這個問題中提到的為什么 Python CSV 閱讀器忽略雙引號字段? 您需要添加 skipinitalspace 參數,以便 csv.reader 能夠理解引號。

import csv
import codecs
file_path=r'C:\data.csv'
f = codecs.open(file_path, encoding = "utf8", errors ='replace')
csvreader = csv.reader(f, delimiter=',', skipinitialspace=True)
for row in csvreader:
    print(row)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM