简体   繁体   English

如何用带有逗号分隔符和空格的pandas解析csv?

[英]How do I parse a csv with pandas that has a comma delimiter and space?

I currently have the following data.csv which has a comma delimiter: 我目前有以下data.csv ,它有逗号分隔符:

name,day
Chicken Sandwich,Wednesday
Pesto Pasta,Thursday
Lettuce, Tomato & Onion Sandwich,Friday
Lettuce, Tomato & Onion Pita,Friday
Soup,Saturday

The parser script is: 解析器脚本是:

import pandas as pd


df = pd.read_csv('data.csv', delimiter=',', error_bad_lines=False, index_col=False)
print(df.head(5))

The output is: 输出是:

Skipping line 4: expected 2 fields, saw 3
Skipping line 5: expected 2 fields, saw 3

               name        day
0  Chicken Sandwich  Wednesday
1       Pesto Pasta   Thursday
2              Soup   Saturday

How do I handle the case Lettuce, Tomato & Onion Sandwich . 我该如何处理Lettuce, Tomato & Onion Sandwich Each item should be separated by , but it's possible that an item has a comma in it followed by a space. 每个项目应该分开,但项目中可能有逗号后跟空格。 The desired output is: 所需的输出是:

                               name        day
0                  Chicken Sandwich  Wednesday
1                       Pesto Pasta   Thursday
2  Lettuce, Tomato & Onion Sandwich     Friday
3      Lettuce, Tomato & Onion Pita     Friday
4                              Soup   Saturday

This might help. 这可能有所帮助。

import pandas as pd
p = "PATH_TO.csv"
df = pd.read_csv(p, delimiter='(,(?=\S)|:)')
#print(df.head(5))
print "-----"
print df["name"]
print "-----"
print df["day"]

Output: 输出:

-----
0                    Chicken Sandwich
1                         Pesto Pasta
2    Lettuce, Tomato & Onion Sandwich
3        Lettuce, Tomato & Onion Pita
4                                Soup
Name: name, dtype: object
-----
0    Wednesday
1     Thursday
2       Friday
3       Friday
4     Saturday
Name: day, dtype: object

An alternative that works in other situations too. 另一种适用于其他情况的替代方案。 OK, it's ugly. 好的,这很难看。

import pandas as pd
from io import StringIO

for_pd = StringIO()
with open('theirry.csv') as input:
    for line in input:
        line = line.rstrip().replace(', ', '|||').replace(',', '```').replace('|||', ', ').replace('```', '|')
        print (line, file=for_pd)
for_pd.seek(0)

df = pd.read_csv(for_pd, sep='|')

print (df)

Result: 结果:

                               name        day
0                  Chicken Sandwich  Wednesday
1                       Pesto Pasta   Thursday
2  Lettuce, Tomato & Onion Sandwich     Friday
3      Lettuce, Tomato & Onion Pita     Friday
4                              Soup   Saturday

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将以空格作为分隔符的.txt 转换为以逗号作为分隔符的.csv - converting .txt with space as delimiter to .csv with comma as delimiter 如何将同时具有逗号和空格分隔符的 CSV 文件转换为仅具有空格分隔符的 csv - How to convert CSV file which having both comma and space delimiter to csv with only space delimiter 如何将具有逗号分隔符的 CSV 文件转换为仅具有空格分隔符的 csv - How to convert CSV file which having comma delimiter to csv with only space delimiter 逗号分隔符 CSV 导入为 Pandas 数据框 - Comma Delimiter CSV Import as Pandas Data Frame 如何使用包含多个字符的分隔符将.txt解析为pandas df? - How can I parse a .txt with a delimiter that has multiple characters into a pandas df? 从 pandas csv 读取时,我可以使用制表符或逗号作为分隔符吗? - May I use either tab or comma as delimiter when reading from pandas csv? 我如何使用带有很多分隔符的熊猫转换 csv - how i convert csv using pandas with a lot of delimiter 如何将使用逗号作为分隔符但其中一列有逗号的文件导入熊猫? - How to import into pandas a file that is using a comma as delimiter but one of its columns has commas? pandas.read_csv:如何在分层索引的 CSV 中将两列解析为日期时间? - pandas.read_csv: how do I parse two columns as datetimes in a hierarchically-indexed CSV? 我如何处理来自其中一个字段中包含逗号的 csv 文件的数据? - How do i handle data from a csv file that has a comma in one of the fields?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM