简体   繁体   English

Pandas:读取多字符分隔符 csv 文件?

[英]Pandas: Read multi-char delimiter csv file?

I have the following as a csv file that I want to read using pandas.read_csv , but not working correctly.我有以下 csv 文件,我想使用pandas.read_csv读取,但无法正常工作。

                                                                Mat  Pur Mat    Mat  Proc ABC   TimePrice            Crncy Supplier      
Plant Material Number   Material Description                    Grp  Grp Status Type Type Class daysper each         Key   Consignment   
-----------------------------------------------------------------------------------------------------------------------------------------
0009  076/JJJJJJJ331    DUMMY UNIT/Dummy Unit 265x225x15        ZEEJJMA9   P5   JERI   F         99          99.9900 SEK               0
0009  1/JJJJJJJJJ/1R3   EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9   P8   JERI   F         99       9,999.9900 SEK               0
0009  1/JJJJJJJJJ/4     EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9   P5   JERI   F         99         999.9900 SEK               0
0009  1/JJJJJJJJJ/1     BASIC EQUIP.MAGAZINE/Remote IRU Enclosur305  MA9   P5   JERI   F         99       9,999.9900 SEK               0
0009  1/JJJJJJ04        EQUIPPED CABINET/BYB 504 Multi-Pack Kit ZEEJJMA9   P5   JERI   F         99      99,999.9900 SEK               0
0009  1/JJJJJJJJ/6      CABLE BUSHING/O-Ring id 21, th 2 for M25ZEEJJMA9   P5   JCOM   F         99           9.9900 SEK               0
0009  1/JJJJJJJJJ       PACKAGE/Pallet 800*114*600              ZEEJJMA9   P5   JVER   F         99         999.9900 SEK               0
0009  1/JJJJJJJJJ       PACKING MATERIAL/Pallet 1200*800*160    ZEEJJMA9   P5   JCOM   F        999         999.9900 SEK               0
0009  1/JJJJJJJJ/06     BAG/PåSE/MINIGRIP/300*250 MM            ZEEJJMA9   P5   JCOM   F          9           9.9900 SEK               0
0009  1/JJJJJJJJ        BAG/Antistatic zip lock bag 75x100      ZEEJJMA9   P5   JCOM   F          9           9.9900 SEK               0

I have tried the following code, but the issue is我已经尝试了以下代码,但问题是

  • the white spaces that show up in the material description出现在材料描述中的空白
  • finding it hard to read the headers发现很难阅读标题
  • no space between Material Description and Mat Grp for line 2, 3, etc.第 2、3 行等的Material DescriptionMat Grp之间没有空格。
import pandas as pd

df = pd.read_csv(file_path, delim_whitespace=True, skiprows=4, header=None, error_bad_lines=False, engine="python")

I believe you are looking for the pandas read_fwf function.我相信您正在寻找 pandas read_fwf函数。 Unfortunately, you have to specify the width of the columns by hand.不幸的是,您必须手动指定列的宽度。 Here is an example for the first few columns:以下是前几列的示例:

s = '''
0009  076/JJJJJJJ331    DUMMY UNIT/Dummy Unit 265x225x15        ZEEJJMA9   P5   JERI   F         99          99.9900 SEK               0
0009  1/JJJJJJJJJ/1R3   EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9   P8   JERI   F         99       9,999.9900 SEK               0
0009  1/JJJJJJJJJ/4     EQUIPPED MAGAZINE/SUP 6601; Equipped magZEEJJMA9   P5   JERI   F         99         999.9900 SEK               0
0009  1/JJJJJJJJJ/1     BASIC EQUIP.MAGAZINE/Remote IRU Enclosur305  MA9   P5   JERI   F         99       9,999.9900 SEK               0
0009  1/JJJJJJ04        EQUIPPED CABINET/BYB 504 Multi-Pack Kit ZEEJJMA9   P5   JERI   F         99      99,999.9900 SEK               0
0009  1/JJJJJJJJ/6      CABLE BUSHING/O-Ring id 21, th 2 for M25ZEEJJMA9   P5   JCOM   F         99           9.9900 SEK               0
0009  1/JJJJJJJJJ       PACKAGE/Pallet 800*114*600              ZEEJJMA9   P5   JVER   F         99         999.9900 SEK               0
0009  1/JJJJJJJJJ       PACKING MATERIAL/Pallet 1200*800*160    ZEEJJMA9   P5   JCOM   F        999         999.9900 SEK               0
0009  1/JJJJJJJJ/06     BAG/PåSE/MINIGRIP/300*250 MM            ZEEJJMA9   P5   JCOM   F          9           9.9900 SEK               0
0009  1/JJJJJJJJ        BAG/Antistatic zip lock bag 75x100      ZEEJJMA9   P5   JCOM   F          9           9.9900 SEK               0
'''

from io import StringIO
import pandas as pd
df = pd.read_fwf(StringIO(s), colspecs=[(0,5), (6,20), (24,64), (64,72)])

here's the output dataframe:这是输出数据帧:

   Unnamed: 0      Unnamed: 1                                Unnamed: 2  \
0           9  076/JJJJJJJ331          DUMMY UNIT/Dummy Unit 265x225x15   
1           9  1/JJJJJJJJJ/1R  EQUIPPED MAGAZINE/SUP 6601; Equipped mag   
2           9   1/JJJJJJJJJ/4  EQUIPPED MAGAZINE/SUP 6601; Equipped mag   
3           9   1/JJJJJJJJJ/1  BASIC EQUIP.MAGAZINE/Remote IRU Enclosur   
4           9      1/JJJJJJ04   EQUIPPED CABINET/BYB 504 Multi-Pack Kit   
5           9    1/JJJJJJJJ/6  CABLE BUSHING/O-Ring id 21, th 2 for M25   
6           9     1/JJJJJJJJJ                PACKAGE/Pallet 800*114*600   
7           9     1/JJJJJJJJJ      PACKING MATERIAL/Pallet 1200*800*160   
8           9   1/JJJJJJJJ/06              BAG/PåSE/MINIGRIP/300*250 MM   
9           9      1/JJJJJJJJ        BAG/Antistatic zip lock bag 75x100   

  Unnamed: 3  
0   ZEEJJMA9  
1   ZEEJJMA9  
2   ZEEJJMA9  
3   305  MA9  
4   ZEEJJMA9  
5   ZEEJJMA9  
6   ZEEJJMA9  
7   ZEEJJMA9  
8   ZEEJJMA9  
9   ZEEJJMA9  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python pandas: ParserError: multi-char delimiter is used - Python pandas: ParserError: multi-char delimiter is used Pandas read_csv (quickly,) with non-regex, multi-char sep - Pandas read_csv (quickly!) with non-regex, multi-char sep Python pandas:ParserError:错误可能是由于使用多字符分隔符时引号被忽略 - Python pandas: ParserError: Error could possibly be due to quotes being ignored when a multi-char delimiter is used Pandas:错误可能是由于使用多字符分隔符时引号被忽略 - Pandas: Error could possibly be due to quotes being ignored when a multi-char delimiter is used pandas.errors.ParserError:错误可能是由于使用多字符分隔符时忽略了引号 - pandas.errors.ParserError: Error could possibly be due to quotes being ignored when a multi-char delimiter is used 如何使用熊猫在最后一个字段中存在分隔符的情况下读取CSV文件? - How to use Pandas to read CSV file with delimiter existing in the last field? Pandas 无法使用 pandas 读取 CSV 文件,带有额外的引号字符 - Pandas Unable to Read CSV file using pandas, with extra quote char 从文件读取的CSV分隔符 - CSV delimiter that is read from a file 以分号为分隔符读取 CSV 文件 - Read CSV file with semicolon as delimiter Pandas:是否可以使用多个符号分隔符读取 CSV? - Pandas: is it possible to read CSV with multiple symbols delimiter?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM