How to split by range of characters pandas dataframe string into separate rows

Question

Could you please help me with below code? I will try to bem sraight and simple as much as i can.

This is an extract of my df

I have built it with below code, because i saw that SB string repeat always with 7 character (7, 14, 21 and 28).

df_split = df_excelsb_melt[df_excelsb_melt['SB'].str.len() > 7] df_split['SB'].str.len().unique()

Out put was:

array([14, 21, 28], dtype=int64)

3)What i've tried to do:

explode(df_split.assign(SB=df_split.SB.str.split(range(0,df_split.SB.str.len(),7)),'SB')

out put was: SyntaxError: unexpected EOF while parsing

What the code shoud have done:

The above being said, the code should have split SB column in 7 characters.

Thanks in advance.

Answer 1

EDIT

A simple solution using regex :

import re
import pandas as pd

data = [{'MOD': 42334,
  'SB': '38-101138-3015',
  'AC': 'AAA',
  'COMPLIANCE': 'NOT INCORPORATED'},
 {'MOD': 43765,
  'SB': '49-300949-3012',
  'AC': 'AAA',
  'COMPLIANCE': 'NOT INCORPORATED'}]

df = pd.DataFrame(data)

df['SB'] = df['SB'].apply(lambda x : re.findall('.{1,7}', x))
df = df.explode('SB')

Output

|   MOD | SB      | AC   | COMPLIANCE       |
|------:|:--------|:-----|:-----------------|
| 42334 | 38-1011 | AAA  | NOT INCORPORATED |
| 42334 | 38-3015 | AAA  | NOT INCORPORATED |
| 43765 | 49-3009 | AAA  | NOT INCORPORATED |
| 43765 | 49-3012 | AAA  | NOT INCORPORATED |

Original solution

With a combination of df.iterrows() and regex :

output = []

#Loop through the records
for record in df.to_dict('records'):
    #Find the SB codes with some regex logic
    for x in re.findall('.{1,7}', record['SB']):
        temp = record.copy()
        temp['SB'] = x
        #Append to the output list
        output.append(temp)
        
new_df = pd.DataFrame(output)

How to split by range of characters pandas dataframe string into separate rows

Question

1 answers

solution1
1 ACCPTED 2020-09-21 15:14:06

How to split by range of characters pandas dataframe string into separate rows

Question

1 answers

solution1 1 ACCPTED 2020-09-21 15:14:06

solution1
1 ACCPTED 2020-09-21 15:14:06