[英]python csv module read csv split by comma but ignore the comma inside double or single quotes
我有一個.csv文件,其列值包含一些逗號。 以下是示例:
Header: ID Value Content Date
1 34 "market, business" 12/20/2013
2 15 "market, business", yesterday, metric 11/21/2014
3 18 "market," business and yesterday 10/20/2014
4 19 yesterday, today, 11/22/2014
這是.csv文件的格式,如果我以Sublime Text打開,它將以以下格式顯示:
1, 34, "market, business", 12/20/2013
2, 15, "market, business", "yesterday, metric, 11/21/2014
3, 18, "market," business and yesterday, 10/20/2014
4, 19, yesterday, today, 11/22/2014
但是我想要的是python csv reader程序之后:
[1, 34, "market, business", 12/20/2013]
[2, 15, "market, business" "yesterday metric, 11/21/2014]
[3, 18, "market," business and yesterday, 10/20/2014]
[4, 19, yesterday today, 11/22/2014]
這些只是我的示例數據,這里的“內容”列令人頭疼,因為csv模塊使用“,”作為分隔符,我使用了
reader = csv.reader(f, skipinitialspace=True)
如果所有字符串都在一個雙引號內,則它適用於第一行。 但是,如果引號外有逗號(單或雙),則不適用於第三和第二行
我該如何解決這個問題? 我現在只是在python中使用傳統的csv模塊,“ panda”有能力解決問題嗎?
謝謝。
我進行了一些更新,我想我想要的是在不同位置指定逗號的方法...現在我在這里粘貼似乎不合理,因為我無法在csv模塊內部找到區分分隔符“,”和“ ”。 即使是Excel也無法...
有任何想法嗎?
如果我們可以假設
那么您的數據可以通過以下方式進行解析:
data = list()
with open('data') as f:
for line in f:
parts = line.split(',', 2)
parts[2:4] = parts[2].rsplit(',', 1)
parts[:2] = map(int, parts[:2])
parts[2:] = map(str.strip, parts[2:])
data.append(parts)
for row in data:
print(row)
產量
[1, 34, '"market, business"', '12/20/2013']
[2, 15, '"market, business", "yesterday, metric', '11/21/2014']
[3, 18, '"market," business and yesterday', '10/20/2014']
[4, 19, 'yesterday, today', '11/22/2014']
然后,您可以像這樣制作一個DataFrame:
import pandas as pd
df = pd.DataFrame(data, columns=['Id','Value','Content','Date'])
print(df)
產量
Id Value Content Date
0 1 34 "market, business" 12/20/2013
1 2 15 "market, business", "yesterday, metric 11/21/2014
2 3 18 "market," business and yesterday 10/20/2014
3 4 19 yesterday, today 11/22/2014
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.