简体   繁体   中英

From multiple values per rows of a pandas dataframe: get two columns with every realation of the values (to analyse the network with Networkx)

I have a dataframe with names of persons in it. The persons work thogether on the same item.

item   names
a      moriz, jon, cate 
b      jon, lenard 
c      cate, martin, leo, jil 
  • I like to prepare the names for a network-visualisation. I need to split the name-cells up in in two rows: in a way, that every relation is shown. like this:
item    person 1    person 2
a       moriz       jon
a       moriz       cate
a       jon         cate
b       jon         lenard
c       cate        martin
c       cate        leo
c       cate        jil
c       jil         martin
c       jil         leo
c       martin      leo
  • I know how to split the name-cell in multiple name-cells for each item. But I don't know how to list them in pairs with every relation per item.

You could do something like this ( df your dataframe):

from itertools import combinations

df.names = df.names.str.split(", ").map(lambda l: [*combinations(l, 2)])
df = df.explode("names")
df[["person 1", "person 2"]] = df.names.str.join(",").str.split(",", expand=True)
df = df.drop(columns="names")

Result for the sample:

  item person 1 person 2
0    a    moriz      jon
0    a    moriz     cate
0    a      jon     cate
1    b      jon   lenard
2    c     cate   martin
2    c     cate      leo
2    c     cate      jil
2    c   martin      leo
2    c   martin      jil
2    c      leo      jil

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM