Split pandas dataframe string into separate rows

Question

I have a dataframe of text strings which essentially represents one or many journeys per row. I'm trying to split the legs of the journey so I can see them individually. The example input dataframe looks as follows:

UPDATED:

df_input = pd.DataFrame([{'var1':'A/A1', 'var2':'x/y/z', 'var3':'abc1'}, 
                         {'var1':'B', 'var2':'xx/yy', 'var3':'abc2'}, 
                         {'var1':'c', 'var2':'zz', 'var3':'abcd'}])

   var1 var2    var3
0  A/A1 x/y/z   abc1
1   B   xx/yy   abc2
2   c   zz      abcd

The output I'm trying to get should look as follows. So for the first example, the journey legs are A to A1 then A1 to x then x to y and then y to z . If there is also a way to add an additional column indicating the journey leg number (1,2,3 etc.) that'll be very helpful. var3 has no importance here, but I've just included it to show that there are other columns which get repeated when the rows are split.

df_output = pd.DataFrame([{'var1': 'A', 'var2': 'A1', 'var3':'abc1'}, 
                          {'var1': 'A1', 'var2': 'x', 'var3':'abc1'},
                          {'var1': 'x', 'var2': 'y', 'var3':'abc1'},
                          {'var1': 'y', 'var2': 'z', 'var3':'abc1'},
                          {'var1': 'B', 'var2': 'xx', 'var3':'abc2'},
                          {'var1': 'xx', 'var2': 'yy', 'var3':'abc2'},
                          {'var1': 'c', 'var2': 'zz', 'var3':'abcd'}])

  var1 var2 var3
0   A   A1  abc1
1   A1  x   abc1
2   x   y   abc1
3   y   z   abc1
4   B   xx  abc2
5   xx  yy  abc2
6   c   zz  abcd

Can someone please help?

Thanks

Answer 1

Solution

Try this.

EDIT : Made a change based on the suggestion from @Ben.T .

df = pd.concat([df.rename(columns={'var2': 'var2old'}), 
                df.var2.str.split('/').explode()], 
               axis=1, join='outer')
## CREDIT: @Ben.T
df['var1'] = df['var1'].where(df['var1'].ne(df['var1'].shift()), df['var2'].shift())
print(df)

Output :

  var1 var2old  var3 var2
0    A   x/y/z  abc1    x
0    x   x/y/z  abc1    y
0    y   x/y/z  abc1    z
1    B   xx/yy  abc2   xx
1   xx   xx/yy  abc2   yy
2    c      zz  abcd   zz

Dummy Data

The data originally posted by the OP ( Original Poster of the question).

import pandas as pd

df = pd.DataFrame([{'var1':'A', 'var2':'x/y/z', 'var3':'abc1'}, 
                   {'var1':'B', 'var2':'xx/yy', 'var3':'abc2'}, 
                   {'var1':'c', 'var2':'zz', 'var3':'abcd'}])

Answer 2

Try with explode

df=df_input.assign(var2=df_input.var2.str.split('/')).explode('var2')
  var1 var2  var3
0    A    x  abc1
0    A    y  abc1
0    A    z  abc1
1    B   xx  abc2
1    B   yy  abc2
2    c   zz  abcd

Then groupby + shift

df.var1=df.groupby(level=0).var2.shift().fillna(df.var1)
df
  var1 var2  var3
0    A    x  abc1
0    x    y  abc1
0    y    z  abc1
1    B   xx  abc2
1   xx   yy  abc2
2    c   zz  abcd

Split pandas dataframe string into separate rows

Question

2 answers

solution1
4 2020-07-10 00:37:27

Solution

Dummy Data

solution2
4 ACCPTED 2020-07-10 00:50:21

Split pandas dataframe string into separate rows

Question

2 answers

solution1 4 2020-07-10 00:37:27

Solution

Dummy Data

solution2 4 ACCPTED 2020-07-10 00:50:21

solution1
4 2020-07-10 00:37:27

solution2
4 ACCPTED 2020-07-10 00:50:21