简体   繁体   English

将单列拆分为 Dataframe 中的 4 个不同的单独列

[英]Split the single column to 4 different separate columns in Dataframe

I just need need to split a single column of dataframe to 4 different columns.我只需要将 dataframe 的一列拆分为 4 个不同的列。 I tried few steps but didn't worked.我尝试了几个步骤但没有奏效。

DATA1:数据 1:

 Dump               
12525 2 153 89-8 Winch
24798 1 147 65-4 Gear
65116 4          Screw 
46456 1          Rowing
46563 5          Nut       

Expected1:预期1:

 Item  Qty  Part_no    Description             
12525  2    153 89-8   Winch
24798  1    147 65-4     Gear
65116  4               Screw 
46456  1               Rowing
46563  5               Nut       

DATA2:数据2:

 Dump               
12525 2 153 89-8 Winch Gear
24798 1 147 65-4 Gear nuts
65116 X          Screw bolts
46456 1          Rowing rings
46563 X          Nut       

Expected2:预期2:

 Item  Qty  Part_no    Description             
12525  2    153 89-8   Winch Gear
24798  1    147 65-4   Gear nuts
65116  X               Screw bolts
46456  1               Rowing rings
46563  X               Nut       

I tried the below code我试过下面的代码

data_df[['Item','Qty','Part_no','Description']] = data_df["Dump"].str.split(" ", 3, expand=True)

and got the output like 

 Item  Qty  Part_no  Description             
12525  2    153 89-8   Winch
24798  1    147 65-4   Gear
65116  4    Screw 
46456  1    Rowing
46563  5    Nut       

Also I tried with this code but not got the expected output:我也尝试使用此代码但没有得到预期的 output:

data_df[['Item','Qty','Part_no','Description']] = data_df['Dump'].str.extract(r'(\d+)\s+(\S+)\s+(\d*)\s*(.+)$')

Any suggestions, how can i fix this???任何建议,我该如何解决???

Similar to this question: Split the single column to 4 different columns in Dataframe类似这个问题: Split the single column to 4 different columns in Dataframe

You could match the data format of the Part_no column in a capture group and make the data in that group optional to keep 4 columns.您可以匹配捕获组中Part_no列的数据格式,并使该组中的数据可选以保留 4 列。

(\d+)\s+(\S+)\s+((?:\d+\s+\d+-\d+)?\s*)(.+)$

Regex demo正则表达式演示

Example with named capture groups and str.extractall具有命名捕获组和str.extractall的示例

import pandas as pd

pattern = r'(?m)(?P<Item>\d+)\s+(?P<Qty>\S+)\s+(?P<Part_no>(?:\d+\s+\d+-\d+)?\s*)(?P<Description>.+)$'
items = [("12525 2 153 89-8 Winch Gear\n"
          "24798 1 147 65-4 Gear nuts\n"
          "65116 X          Screw bolts\n"
          "46456 1          Rowing rings\n"
          "46563 X          Nut  ")]

data_df = pd.DataFrame(items, columns=["Dump"])
res = data_df['Dump']\
    .str\
    .extractall(pattern)\
    .fillna('')

print(res)

Output Output

          Item Qty    Part_no   Description
  match                                    
0 0      12525   2  153 89-8     Winch Gear
  1      24798   1  147 65-4      Gear nuts
  2      65116   X              Screw bolts
  3      46456   1             Rowing rings
  4      46563   X                    Nut  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM