[英]python csv module read csv split by comma but ignore the comma inside double or single quotes
I have a .csv file with column values contain some commas. 我有一个.csv文件,其列值包含一些逗号。 Below are the examples:
以下是示例:
Header: ID Value Content Date
1 34 "market, business" 12/20/2013
2 15 "market, business", yesterday, metric 11/21/2014
3 18 "market," business and yesterday 10/20/2014
4 19 yesterday, today, 11/22/2014
This is the format of the .csv file which if I open in Sublime Text, it appears in format: 这是.csv文件的格式,如果我以Sublime Text打开,它将以以下格式显示:
1, 34, "market, business", 12/20/2013
2, 15, "market, business", "yesterday, metric, 11/21/2014
3, 18, "market," business and yesterday, 10/20/2014
4, 19, yesterday, today, 11/22/2014
But what I want is after the python csv reader program is: 但是我想要的是python csv reader程序之后:
[1, 34, "market, business", 12/20/2013]
[2, 15, "market, business" "yesterday metric, 11/21/2014]
[3, 18, "market," business and yesterday, 10/20/2014]
[4, 19, yesterday today, 11/22/2014]
These are just sample data I have, the "content" column is the headache here cause csv module uses "," as separator, I used 这些只是我的示例数据,这里的“内容”列令人头疼,因为csv模块使用“,”作为分隔符,我使用了
reader = csv.reader(f, skipinitialspace=True)
It works for the first row if all the strings are inside one double quotes. 如果所有字符串都在一个双引号内,则它适用于第一行。 But it doesn't apply for the third and second row if there're commas outside the quotes (single or double)
但是,如果引号外有逗号(单或双),则不适用于第三和第二行
How can I solve the problem? 我该如何解决这个问题? I'm just using the traditional csv module in python now, does "panda" has the ability to solve the problem?
我现在只是在python中使用传统的csv模块,“ panda”有能力解决问题吗?
Thanks. 谢谢。
I made some updates, I think what I want is, method to specify comma at different places... Now I paste here it seems unreasonable cause there's no way I can find inside csv module to tell the differences from separator "," and "," inside a field. 我进行了一些更新,我想我想要的是在不同位置指定逗号的方法...现在我在这里粘贴似乎不合理,因为我无法在csv模块内部找到区分分隔符“,”和“ ”。 Even excel can't...
即使是Excel也无法...
Any ideas? 有任何想法吗?
If we can assume 如果我们可以假设
then your data could be parsed this way: 那么您的数据可以通过以下方式进行解析:
data = list()
with open('data') as f:
for line in f:
parts = line.split(',', 2)
parts[2:4] = parts[2].rsplit(',', 1)
parts[:2] = map(int, parts[:2])
parts[2:] = map(str.strip, parts[2:])
data.append(parts)
for row in data:
print(row)
yields 产量
[1, 34, '"market, business"', '12/20/2013']
[2, 15, '"market, business", "yesterday, metric', '11/21/2014']
[3, 18, '"market," business and yesterday', '10/20/2014']
[4, 19, 'yesterday, today', '11/22/2014']
You could then make a DataFrame like this: 然后,您可以像这样制作一个DataFrame:
import pandas as pd
df = pd.DataFrame(data, columns=['Id','Value','Content','Date'])
print(df)
yields 产量
Id Value Content Date
0 1 34 "market, business" 12/20/2013
1 2 15 "market, business", "yesterday, metric 11/21/2014
2 3 18 "market," business and yesterday 10/20/2014
3 4 19 yesterday, today 11/22/2014
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.