简体   繁体   中英

Split pandas dataframe string into separate rows

I have a dataframe of text strings which essentially represents one or many journeys per row. I'm trying to split the legs of the journey so I can see them individually. The example input dataframe looks as follows:

UPDATED:

df_input = pd.DataFrame([{'var1':'A/A1', 'var2':'x/y/z', 'var3':'abc1'}, 
                         {'var1':'B', 'var2':'xx/yy', 'var3':'abc2'}, 
                         {'var1':'c', 'var2':'zz', 'var3':'abcd'}])

   var1 var2    var3
0  A/A1 x/y/z   abc1
1   B   xx/yy   abc2
2   c   zz      abcd

The output I'm trying to get should look as follows. So for the first example, the journey legs are A to A1 then A1 to x then x to y and then y to z . If there is also a way to add an additional column indicating the journey leg number (1,2,3 etc.) that'll be very helpful. var3 has no importance here, but I've just included it to show that there are other columns which get repeated when the rows are split.

df_output = pd.DataFrame([{'var1': 'A', 'var2': 'A1', 'var3':'abc1'}, 
                          {'var1': 'A1', 'var2': 'x', 'var3':'abc1'},
                          {'var1': 'x', 'var2': 'y', 'var3':'abc1'},
                          {'var1': 'y', 'var2': 'z', 'var3':'abc1'},
                          {'var1': 'B', 'var2': 'xx', 'var3':'abc2'},
                          {'var1': 'xx', 'var2': 'yy', 'var3':'abc2'},
                          {'var1': 'c', 'var2': 'zz', 'var3':'abcd'}])

  var1 var2 var3
0   A   A1  abc1
1   A1  x   abc1
2   x   y   abc1
3   y   z   abc1
4   B   xx  abc2
5   xx  yy  abc2
6   c   zz  abcd

Can someone please help?

Thanks

Solution

Try this.

EDIT : Made a change based on the suggestion from @Ben.T .

df = pd.concat([df.rename(columns={'var2': 'var2old'}), 
                df.var2.str.split('/').explode()], 
               axis=1, join='outer')
## CREDIT: @Ben.T
df['var1'] = df['var1'].where(df['var1'].ne(df['var1'].shift()), df['var2'].shift())
print(df)

Output :

  var1 var2old  var3 var2
0    A   x/y/z  abc1    x
0    x   x/y/z  abc1    y
0    y   x/y/z  abc1    z
1    B   xx/yy  abc2   xx
1   xx   xx/yy  abc2   yy
2    c      zz  abcd   zz

Dummy Data

The data originally posted by the OP ( Original Poster of the question).

import pandas as pd

df = pd.DataFrame([{'var1':'A', 'var2':'x/y/z', 'var3':'abc1'}, 
                   {'var1':'B', 'var2':'xx/yy', 'var3':'abc2'}, 
                   {'var1':'c', 'var2':'zz', 'var3':'abcd'}])

Try with explode

df=df_input.assign(var2=df_input.var2.str.split('/')).explode('var2')
  var1 var2  var3
0    A    x  abc1
0    A    y  abc1
0    A    z  abc1
1    B   xx  abc2
1    B   yy  abc2
2    c   zz  abcd

Then groupby + shift

df.var1=df.groupby(level=0).var2.shift().fillna(df.var1)
df
  var1 var2  var3
0    A    x  abc1
0    x    y  abc1
0    y    z  abc1
1    B   xx  abc2
1   xx   yy  abc2
2    c   zz  abcd

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM