简体   繁体   中英

Itertools combinations of multiple columns

I have this data

product color size
p1      Red   XXL
p2      Blue  XL
p3            L
              S

I want to make combinations from the columns as follows:

p1, Red, XXL
p1, Red, XL
.
.
p3, Blue, S

I tried make all columns in one list then use itertools.combinations but the result contains some unwanted data like:

p1, p2, p3
OR
Red, Blue, XXL
OR
XXL, XL, S ....

my code is:

df = read_csv('./GenerateProducts.csv', delimiter=',')
df_columns = df.columns.tolist()
list_data = DataFrame()
for i in df_columns:
    list_data = concat([list_data,df[i].dropna(axis=0)])
generated_products = DataFrame( combinations( list_data[0] ,len(df_columns) ) )

I am also trying to make it dynamic I tried to make the columns into dict then use keys as pointer for the data but I did not know how to implement this logic, my experience with dict is too shallow

data = dict()
for i in df_columns:
    data[i] = df[i].dropna(axis=0)

I read a bout the itertools.product and that is why I made the dict also to use for loop to make same changes using the dict keys.

I think my execution with dict got me to confuse myself, any guidance

EDIT:

I got it to work

temp = []
for i in df_columns:
    temp += [data[i]]
    
final_df = DataFrame(product(*temp), columns=df_columns)
final_df

I am wondering is there more efficient way to accomplish the same result

Thank you

Yes, there is a more efficient way, using itertools.product() :

import itertools

prod1 = ['p1', 'p2', 'p3']
color1 = ['Red', 'Blue']
size1 = ['XXL', 'XL', 'L', 'S']
t1 = itertools.product(prod1, color1, size1)
for t in t1:
    print(t)

Output

('p1', 'Red', 'XXL')
('p1', 'Red', 'XL')
('p1', 'Red', 'L')
('p1', 'Red', 'S')
('p1', 'Blue', 'XXL')
('p1', 'Blue', 'XL')
('p1', 'Blue', 'L')
('p1', 'Blue', 'S')
('p2', 'Red', 'XXL')
('p2', 'Red', 'XL')
('p2', 'Red', 'L')
('p2', 'Red', 'S')
('p2', 'Blue', 'XXL')
('p2', 'Blue', 'XL')
('p2', 'Blue', 'L')
('p2', 'Blue', 'S')
('p3', 'Red', 'XXL')
('p3', 'Red', 'XL')
('p3', 'Red', 'L')
('p3', 'Red', 'S')
('p3', 'Blue', 'XXL')
('p3', 'Blue', 'XL')
('p3', 'Blue', 'L')
('p3', 'Blue', 'S')

I assume your data is stored as follows in your dataframe (5 rows x 3 columns).

  product color size
0      p1   Red  XXL
1      p2  Blue   XL
2      p3          L
3                  S

Using List Comprehension

You want to create a dataframe with a combination of each of these. You can do this using a list comprehension and then creating a dataframe from the results.

Here's how to do it.

import pandas as pd
df = pd.DataFrame({'product':['p1','p2','p3',''],
                   'color':['Red','Blue','',''],
                   'size':['XXL','XL','L','S']})

outlist = [(i,j,k)
           for i in df['product'] if i != ''
           for j in df['color'] if j != ''
           for k in df['size']]

newdf = pd.DataFrame(data=outlist,columns=['product','color','size'])
print (newdf)

The new dataframe will be:

   product color size
0       p1   Red  XXL
1       p1   Red   XL
2       p1   Red    L
3       p1   Red    S
4       p1  Blue  XXL
5       p1  Blue   XL
6       p1  Blue    L
7       p1  Blue    S
8       p2   Red  XXL
9       p2   Red   XL
10      p2   Red    L
11      p2   Red    S
12      p2  Blue  XXL
13      p2  Blue   XL
14      p2  Blue    L
15      p2  Blue    S
16      p3   Red  XXL
17      p3   Red   XL
18      p3   Red    L
19      p3   Red    S
20      p3  Blue  XXL
21      p3  Blue   XL
22      p3  Blue    L
23      p3  Blue    S

Using product from itertools

Alternate approach is to use product from itertools

You can do this instead:

import pandas as pd
from itertools import product
df = pd.DataFrame({'product':['p1','p2','p3',''],
                   'color':['Red','Blue','',''],
                   'size':['XXL','XL','L','S']})

print (df)

new_df = pd.DataFrame(data=list(product(df['product'],
                                        df['color'],
                                        df['size'])),
                      columns=['product','color','size'])
new_df.drop(new_df[(new_df['product'] == '') | (new_df['color'] == '')].index, inplace = True)
new_df = new_df.reset_index(drop=True)
print (new_df)

Note here that I have to remove rows that have product = '' or size = '' as the dataframe has these values and we want to ignore them.

The result of this will be:

   product color size
0       p1   Red  XXL
1       p1   Red   XL
2       p1   Red    L
3       p1   Red    S
4       p1  Blue  XXL
5       p1  Blue   XL
6       p1  Blue    L
7       p1  Blue    S
8       p2   Red  XXL
9       p2   Red   XL
10      p2   Red    L
11      p2   Red    S
12      p2  Blue  XXL
13      p2  Blue   XL
14      p2  Blue    L
15      p2  Blue    S
16      p3   Red  XXL
17      p3   Red   XL
18      p3   Red    L
19      p3   Red    S
20      p3  Blue  XXL
21      p3  Blue   XL
22      p3  Blue    L
23      p3  Blue    S

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM