I have a dataframe of text strings which essentially represents one or many journeys per row. I'm trying to split the legs of the journey so I can see them individually. The example input dataframe looks as follows:
UPDATED:
df_input = pd.DataFrame([{'var1':'A/A1', 'var2':'x/y/z', 'var3':'abc1'},
{'var1':'B', 'var2':'xx/yy', 'var3':'abc2'},
{'var1':'c', 'var2':'zz', 'var3':'abcd'}])
var1 var2 var3
0 A/A1 x/y/z abc1
1 B xx/yy abc2
2 c zz abcd
The output I'm trying to get should look as follows. So for the first example, the journey legs are A to A1 then A1 to x then x to y and then y to z
. If there is also a way to add an additional column indicating the journey leg number (1,2,3 etc.) that'll be very helpful. var3
has no importance here, but I've just included it to show that there are other columns which get repeated when the rows are split.
df_output = pd.DataFrame([{'var1': 'A', 'var2': 'A1', 'var3':'abc1'},
{'var1': 'A1', 'var2': 'x', 'var3':'abc1'},
{'var1': 'x', 'var2': 'y', 'var3':'abc1'},
{'var1': 'y', 'var2': 'z', 'var3':'abc1'},
{'var1': 'B', 'var2': 'xx', 'var3':'abc2'},
{'var1': 'xx', 'var2': 'yy', 'var3':'abc2'},
{'var1': 'c', 'var2': 'zz', 'var3':'abcd'}])
var1 var2 var3
0 A A1 abc1
1 A1 x abc1
2 x y abc1
3 y z abc1
4 B xx abc2
5 xx yy abc2
6 c zz abcd
Can someone please help?
Thanks
Try this.
EDIT
: Made a change based on the suggestion from @Ben.T .
df = pd.concat([df.rename(columns={'var2': 'var2old'}),
df.var2.str.split('/').explode()],
axis=1, join='outer')
## CREDIT: @Ben.T
df['var1'] = df['var1'].where(df['var1'].ne(df['var1'].shift()), df['var2'].shift())
print(df)
Output :
var1 var2old var3 var2
0 A x/y/z abc1 x
0 x x/y/z abc1 y
0 y x/y/z abc1 z
1 B xx/yy abc2 xx
1 xx xx/yy abc2 yy
2 c zz abcd zz
The data originally posted by the OP ( Original Poster of the question).
import pandas as pd
df = pd.DataFrame([{'var1':'A', 'var2':'x/y/z', 'var3':'abc1'},
{'var1':'B', 'var2':'xx/yy', 'var3':'abc2'},
{'var1':'c', 'var2':'zz', 'var3':'abcd'}])
Try with explode
df=df_input.assign(var2=df_input.var2.str.split('/')).explode('var2')
var1 var2 var3
0 A x abc1
0 A y abc1
0 A z abc1
1 B xx abc2
1 B yy abc2
2 c zz abcd
Then groupby
+ shift
df.var1=df.groupby(level=0).var2.shift().fillna(df.var1)
df
var1 var2 var3
0 A x abc1
0 x y abc1
0 y z abc1
1 B xx abc2
1 xx yy abc2
2 c zz abcd
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.