简体   繁体   中英

How to split by range of characters pandas dataframe string into separate rows

Could you please help me with below code? I will try to bem sraight and simple as much as i can.

  1. This is an extract of my df

在此处输入图像描述

  1. I have built it with below code, because i saw that SB string repeat always with 7 character (7, 14, 21 and 28).

df_split = df_excelsb_melt[df_excelsb_melt['SB'].str.len() > 7] df_split['SB'].str.len().unique()

Out put was:

array([14, 21, 28], dtype=int64)

3)What i've tried to do:

explode(df_split.assign(SB=df_split.SB.str.split(range(0,df_split.SB.str.len(),7)),'SB')

out put was: SyntaxError: unexpected EOF while parsing

  1. What the code shoud have done:

在此处输入图像描述

The above being said, the code should have split SB column in 7 characters.

Thanks in advance.

EDIT

A simple solution using regex :

import re
import pandas as pd

data = [{'MOD': 42334,
  'SB': '38-101138-3015',
  'AC': 'AAA',
  'COMPLIANCE': 'NOT INCORPORATED'},
 {'MOD': 43765,
  'SB': '49-300949-3012',
  'AC': 'AAA',
  'COMPLIANCE': 'NOT INCORPORATED'}]

df = pd.DataFrame(data)

df['SB'] = df['SB'].apply(lambda x : re.findall('.{1,7}', x))
df = df.explode('SB')

Output

|   MOD | SB      | AC   | COMPLIANCE       |
|------:|:--------|:-----|:-----------------|
| 42334 | 38-1011 | AAA  | NOT INCORPORATED |
| 42334 | 38-3015 | AAA  | NOT INCORPORATED |
| 43765 | 49-3009 | AAA  | NOT INCORPORATED |
| 43765 | 49-3012 | AAA  | NOT INCORPORATED |

Original solution

With a combination of df.iterrows() and regex :

output = []

#Loop through the records
for record in df.to_dict('records'):
    #Find the SB codes with some regex logic
    for x in re.findall('.{1,7}', record['SB']):
        temp = record.copy()
        temp['SB'] = x
        #Append to the output list
        output.append(temp)
        
new_df = pd.DataFrame(output)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM