[英]How do I parse a csv with pandas that has a comma delimiter and space?
I currently have the following data.csv
which has a comma delimiter: 我目前有以下data.csv
,它有逗号分隔符:
name,day
Chicken Sandwich,Wednesday
Pesto Pasta,Thursday
Lettuce, Tomato & Onion Sandwich,Friday
Lettuce, Tomato & Onion Pita,Friday
Soup,Saturday
The parser script is: 解析器脚本是:
import pandas as pd
df = pd.read_csv('data.csv', delimiter=',', error_bad_lines=False, index_col=False)
print(df.head(5))
The output is: 输出是:
Skipping line 4: expected 2 fields, saw 3
Skipping line 5: expected 2 fields, saw 3
name day
0 Chicken Sandwich Wednesday
1 Pesto Pasta Thursday
2 Soup Saturday
How do I handle the case Lettuce, Tomato & Onion Sandwich
. 我该如何处理Lettuce, Tomato & Onion Sandwich
。 Each item should be separated by ,
but it's possible that an item has a comma in it followed by a space. 每个项目应该分开,
但项目中可能有逗号后跟空格。 The desired output is: 所需的输出是:
name day
0 Chicken Sandwich Wednesday
1 Pesto Pasta Thursday
2 Lettuce, Tomato & Onion Sandwich Friday
3 Lettuce, Tomato & Onion Pita Friday
4 Soup Saturday
This might help. 这可能有所帮助。
import pandas as pd
p = "PATH_TO.csv"
df = pd.read_csv(p, delimiter='(,(?=\S)|:)')
#print(df.head(5))
print "-----"
print df["name"]
print "-----"
print df["day"]
Output: 输出:
-----
0 Chicken Sandwich
1 Pesto Pasta
2 Lettuce, Tomato & Onion Sandwich
3 Lettuce, Tomato & Onion Pita
4 Soup
Name: name, dtype: object
-----
0 Wednesday
1 Thursday
2 Friday
3 Friday
4 Saturday
Name: day, dtype: object
An alternative that works in other situations too. 另一种适用于其他情况的替代方案。 OK, it's ugly. 好的,这很难看。
import pandas as pd
from io import StringIO
for_pd = StringIO()
with open('theirry.csv') as input:
for line in input:
line = line.rstrip().replace(', ', '|||').replace(',', '```').replace('|||', ', ').replace('```', '|')
print (line, file=for_pd)
for_pd.seek(0)
df = pd.read_csv(for_pd, sep='|')
print (df)
Result: 结果:
name day
0 Chicken Sandwich Wednesday
1 Pesto Pasta Thursday
2 Lettuce, Tomato & Onion Sandwich Friday
3 Lettuce, Tomato & Onion Pita Friday
4 Soup Saturday
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.