I have a dataframe with some columns delimited with '|', and I need to flatten this dataframe. Example:
name type
a l
b m
c|d|e n
For this df, I want to flatten it to:
name type
a l
b m
c n
d n
e n
To do this, I used this command:
df = df.assign(name=df.name.str.split('|')).explode(column).drop_duplicates()
Now, I want do one more thing besides above flatten operation:
name type co_occur
a l
b m
c n d
c n e
d n e
That is, not only split the 'c|d|e' into two rows, but also create a new column which contains a 'co_occur' relationship, in which 'c' and 'd' and 'e' co-occur with each other.
I don't see an easy way to do this by modifying:
df = df.assign(name=df.name.str.split('|')).explode(column).drop_duplicates()
I think this is what you want. Use combinations and piece everything together
from itertools import combinations
import io
data = '''name type
a l
b m
c|d|e n
j|k o
f|g|h|i p
'''
df = pd.read_csv(io.StringIO(data), sep=' \s+', engine='python')
# hold the new dataframes as you iterate via apply()
df_hold = []
def explode_combos(x):
combos = list(combinations(x['name'].split('|'),2))
# print(combos)
# print(x['type'])
df_hold.append(pd.DataFrame([{'name':c[0], 'type':x['type'], 'co_cur': c[1]} for c in combos]))
return
# only apply() to those rows that need to be exploded
dft = df[df['name'].str.contains('\|')].apply(explode_combos, axis=1)
# concatenate the result
dfn = pd.concat(df_hold)
# add back to rows that weren't operated on (see the ~)
df_final = pd.concat([df[~df['name'].str.contains('\|')], dfn]).fillna('')
name type co_cur
0 a l
1 b m
0 c n d
1 c n e
2 d n e
0 j o k
0 f p g
1 f p h
2 f p i
3 g p h
4 g p i
5 h p i
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.