[英]Pandas: Read multi-char delimiter csv file?
I have the following as a csv file that I want to read using pandas.read_csv
, but not working correctly.我有以下 csv 文件,我想使用
pandas.read_csv
读取,但无法正常工作。
Mat Pur Mat Mat Proc ABC TimePrice Crncy Supplier
Plant Material Number Material Description Grp Grp Status Type Type Class daysper each Key Consignment
-----------------------------------------------------------------------------------------------------------------------------------------
0009 076/JJJJJJJ331 DUMMY UNIT/Dummy Unit 265x225x15 ZEEJJMA9 P5 JERI F 99 99.9900 SEK 0
0009 1/JJJJJJJJJ/1R3 EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9 P8 JERI F 99 9,999.9900 SEK 0
0009 1/JJJJJJJJJ/4 EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9 P5 JERI F 99 999.9900 SEK 0
0009 1/JJJJJJJJJ/1 BASIC EQUIP.MAGAZINE/Remote IRU Enclosur305 MA9 P5 JERI F 99 9,999.9900 SEK 0
0009 1/JJJJJJ04 EQUIPPED CABINET/BYB 504 Multi-Pack Kit ZEEJJMA9 P5 JERI F 99 99,999.9900 SEK 0
0009 1/JJJJJJJJ/6 CABLE BUSHING/O-Ring id 21, th 2 for M25ZEEJJMA9 P5 JCOM F 99 9.9900 SEK 0
0009 1/JJJJJJJJJ PACKAGE/Pallet 800*114*600 ZEEJJMA9 P5 JVER F 99 999.9900 SEK 0
0009 1/JJJJJJJJJ PACKING MATERIAL/Pallet 1200*800*160 ZEEJJMA9 P5 JCOM F 999 999.9900 SEK 0
0009 1/JJJJJJJJ/06 BAG/PåSE/MINIGRIP/300*250 MM ZEEJJMA9 P5 JCOM F 9 9.9900 SEK 0
0009 1/JJJJJJJJ BAG/Antistatic zip lock bag 75x100 ZEEJJMA9 P5 JCOM F 9 9.9900 SEK 0
I have tried the following code, but the issue is我已经尝试了以下代码,但问题是
Material Description
and Mat Grp
for line 2, 3, etc.Material Description
和Mat Grp
之间没有空格。import pandas as pd
df = pd.read_csv(file_path, delim_whitespace=True, skiprows=4, header=None, error_bad_lines=False, engine="python")
I believe you are looking for the pandas read_fwf function.我相信您正在寻找 pandas read_fwf函数。 Unfortunately, you have to specify the width of the columns by hand.
不幸的是,您必须手动指定列的宽度。 Here is an example for the first few columns:
以下是前几列的示例:
s = '''
0009 076/JJJJJJJ331 DUMMY UNIT/Dummy Unit 265x225x15 ZEEJJMA9 P5 JERI F 99 99.9900 SEK 0
0009 1/JJJJJJJJJ/1R3 EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9 P8 JERI F 99 9,999.9900 SEK 0
0009 1/JJJJJJJJJ/4 EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9 P5 JERI F 99 999.9900 SEK 0
0009 1/JJJJJJJJJ/1 BASIC EQUIP.MAGAZINE/Remote IRU Enclosur305 MA9 P5 JERI F 99 9,999.9900 SEK 0
0009 1/JJJJJJ04 EQUIPPED CABINET/BYB 504 Multi-Pack Kit ZEEJJMA9 P5 JERI F 99 99,999.9900 SEK 0
0009 1/JJJJJJJJ/6 CABLE BUSHING/O-Ring id 21, th 2 for M25ZEEJJMA9 P5 JCOM F 99 9.9900 SEK 0
0009 1/JJJJJJJJJ PACKAGE/Pallet 800*114*600 ZEEJJMA9 P5 JVER F 99 999.9900 SEK 0
0009 1/JJJJJJJJJ PACKING MATERIAL/Pallet 1200*800*160 ZEEJJMA9 P5 JCOM F 999 999.9900 SEK 0
0009 1/JJJJJJJJ/06 BAG/PåSE/MINIGRIP/300*250 MM ZEEJJMA9 P5 JCOM F 9 9.9900 SEK 0
0009 1/JJJJJJJJ BAG/Antistatic zip lock bag 75x100 ZEEJJMA9 P5 JCOM F 9 9.9900 SEK 0
'''
from io import StringIO
import pandas as pd
df = pd.read_fwf(StringIO(s), colspecs=[(0,5), (6,20), (24,64), (64,72)])
here's the output dataframe:这是输出数据帧:
Unnamed: 0 Unnamed: 1 Unnamed: 2 \
0 9 076/JJJJJJJ331 DUMMY UNIT/Dummy Unit 265x225x15
1 9 1/JJJJJJJJJ/1R EQUIPPED MAGAZINE/SUP 6601; Equipped mag
2 9 1/JJJJJJJJJ/4 EQUIPPED MAGAZINE/SUP 6601; Equipped mag
3 9 1/JJJJJJJJJ/1 BASIC EQUIP.MAGAZINE/Remote IRU Enclosur
4 9 1/JJJJJJ04 EQUIPPED CABINET/BYB 504 Multi-Pack Kit
5 9 1/JJJJJJJJ/6 CABLE BUSHING/O-Ring id 21, th 2 for M25
6 9 1/JJJJJJJJJ PACKAGE/Pallet 800*114*600
7 9 1/JJJJJJJJJ PACKING MATERIAL/Pallet 1200*800*160
8 9 1/JJJJJJJJ/06 BAG/PåSE/MINIGRIP/300*250 MM
9 9 1/JJJJJJJJ BAG/Antistatic zip lock bag 75x100
Unnamed: 3
0 ZEEJJMA9
1 ZEEJJMA9
2 ZEEJJMA9
3 305 MA9
4 ZEEJJMA9
5 ZEEJJMA9
6 ZEEJJMA9
7 ZEEJJMA9
8 ZEEJJMA9
9 ZEEJJMA9
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.