简体   繁体   中英

How to create a dataframe containing lthe cartesian product between two lists as elements?

I have a quite big dataframe containing 3 columns, each of which contains a list. Each list can be arbitrarily long (and is, sometimes, quite long).

 a          Yes                 No  
[1, 2, 3]  ["a", "b"]        ["A", "B", "C"]
[7, 11, 6] ["a", "d", "f"]   ["C", "H", "L", "Z"]

I want to create a new dataframe which looks like this:

C1    C2    value 
 1     a      1 
 1     b      1
 2     a      1
 2     b      1
...
 3     b      1
 1     A      0
 1     B      0
 1     C      0
...
 3     C      0
...
 7     a      1
 7     d      1
...
 6     Z      0      

The order of the rows does not matter. I'm looking for an efficient way of doing that.

What I'm doing at the moment is the following (I might be complicating things a bit):

new_df = pd.DataFrame(columns=["C1", "C2", "value"])
def compute_pairs(df1, new_df):
  perm = list(itertools.product(df1.a, df1.Yes))
  for p in perm:
    new_df.loc[len(new_df)] = list(p) + [1]
  perm = list(itertools.product(df1.a, df1.No))
  for p in perm:
    df.loc[len(new_df)] = list(p) + [0]

df1.apply(lambda x: compute_pairs(x, new_df), axis=1)

However, the for look is quite slow. I tried to use map but I failed.

Any improvements?

Let's try melt with explode :

(df.melt('a',var_name='value',value_name='C')
   .explode('C').explode('a')
)

Output:

    a value  C
0   1   Yes  a
0   2   Yes  a
0   3   Yes  a
0   1   Yes  b
0   2   Yes  b
0   3   Yes  b
1   7   Yes  a
1  11   Yes  a
1   6   Yes  a
1   7   Yes  d
1  11   Yes  d
1   6   Yes  d
1   7   Yes  f
1  11   Yes  f
1   6   Yes  f
2   1    No  A
2   2    No  A
2   3    No  A
2   1    No  B
2   2    No  B
2   3    No  B
2   1    No  C
2   2    No  C
2   3    No  C
3   7    No  C
3  11    No  C
3   6    No  C
3   7    No  H
3  11    No  H
3   6    No  H
3   7    No  L
3  11    No  L
3   6    No  L
3   7    No  Z
3  11    No  Z
3   6    No  Z

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM