I have the following dataframe which has the columns ID_x
and ID_y
that contain data separated with a single space:
df = pd.DataFrame({
'fruit':['apple','orange','banana'],
'ID_x' : ['1 2 3','4','5'],
'ID_y' : ['A B', 'C D','E']
}, index=['0','1','2'])
I want to split each value in the columns ( ID_x
and ID_y
) and create new rows such that each row represents one-to-one correspondence of the split values.
Something like this:
Any idea how to tackle this problem?
What I have tried so far splitting the values in the columns:
col_x = 'ID_x'
col_y = 'ID_y'
df = df_unflat.assign(**{col_x:df_unflat[col_x].str.split(' ')})
df = df_unflat.assign(**{col_y:df_unflat[col_y].str.split(' ')})
Try this way out:
import pandas as pd
df = pd.DataFrame({
'fruit':['apple','orange','banana'],
'ID_x' : ['1 2 3','4','5'],
'ID_y' : ['A B', 'C D','E']
}, index=['0','1','2'])
id_x = df['ID_x'].str.split(' ').apply(Series, 1).stack()
id_y = df['ID_y'].str.split(' ').apply(Series, 1).stack()
id_x.index = id_x.index.droplevel(-1)
id_y.index = id_y.index.droplevel(-1)
id_x.name = 'ID_x'
id_y.name = 'ID_y'
del df['ID_x']
del df['ID_y']
df = df.join(id_x)
df = df.join(id_y)
df.reset_index(drop=True)
Output:
fruit ID_x ID_y
0 apple 1 A
1 apple 1 B
2 apple 2 A
3 apple 2 B
4 apple 3 A
5 apple 3 B
6 orange 4 C
7 orange 4 D
8 banana 5 E
import itertools
#convert DF values to a numpy array, get all combinations between ID_x, ID_y and fruit, finally reconstruct the Dataframe.
pd.DataFrame(sum([list(itertools.product(e[0].split(),e[1].split(),[e[2]])) for e in df.values],[]), columns=df.columns)
Out[483]:
ID_x ID_y fruit
0 1 A apple
1 1 B apple
2 2 A apple
3 2 B apple
4 3 A apple
5 3 B apple
6 4 C orange
7 4 D orange
8 5 E banana
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.