Itertools combinations of multiple columns

Question

I have this data

product color size
p1      Red   XXL
p2      Blue  XL
p3            L
              S

I want to make combinations from the columns as follows:

p1, Red, XXL
p1, Red, XL
.
.
p3, Blue, S

I tried make all columns in one list then use itertools.combinations but the result contains some unwanted data like:

p1, p2, p3
OR
Red, Blue, XXL
OR
XXL, XL, S ....

my code is:

df = read_csv('./GenerateProducts.csv', delimiter=',')
df_columns = df.columns.tolist()

list_data = DataFrame()
for i in df_columns:
    list_data = concat([list_data,df[i].dropna(axis=0)])

generated_products = DataFrame( combinations( list_data[0] ,len(df_columns) ) )

I am also trying to make it dynamic I tried to make the columns into dict then use keys as pointer for the data but I did not know how to implement this logic, my experience with dict is too shallow

data = dict()
for i in df_columns:
    data[i] = df[i].dropna(axis=0)

I read a bout the itertools.product and that is why I made the dict also to use for loop to make same changes using the dict keys.

I think my execution with dict got me to confuse myself, any guidance

EDIT:

I got it to work

temp = []
for i in df_columns:
    temp += [data[i]]
    
final_df = DataFrame(product(*temp), columns=df_columns)
final_df

I am wondering is there more efficient way to accomplish the same result

Thank you

Answer 1

Yes, there is a more efficient way, using itertools.product() :

import itertools

prod1 = ['p1', 'p2', 'p3']
color1 = ['Red', 'Blue']
size1 = ['XXL', 'XL', 'L', 'S']
t1 = itertools.product(prod1, color1, size1)
for t in t1:
    print(t)

Output

('p1', 'Red', 'XXL')
('p1', 'Red', 'XL')
('p1', 'Red', 'L')
('p1', 'Red', 'S')
('p1', 'Blue', 'XXL')
('p1', 'Blue', 'XL')
('p1', 'Blue', 'L')
('p1', 'Blue', 'S')
('p2', 'Red', 'XXL')
('p2', 'Red', 'XL')
('p2', 'Red', 'L')
('p2', 'Red', 'S')
('p2', 'Blue', 'XXL')
('p2', 'Blue', 'XL')
('p2', 'Blue', 'L')
('p2', 'Blue', 'S')
('p3', 'Red', 'XXL')
('p3', 'Red', 'XL')
('p3', 'Red', 'L')
('p3', 'Red', 'S')
('p3', 'Blue', 'XXL')
('p3', 'Blue', 'XL')
('p3', 'Blue', 'L')
('p3', 'Blue', 'S')

Answer 2

I assume your data is stored as follows in your dataframe (5 rows x 3 columns).

  product color size
0      p1   Red  XXL
1      p2  Blue   XL
2      p3          L
3                  S

Using List Comprehension

You want to create a dataframe with a combination of each of these. You can do this using a list comprehension and then creating a dataframe from the results.

Here's how to do it.

import pandas as pd
df = pd.DataFrame({'product':['p1','p2','p3',''],
                   'color':['Red','Blue','',''],
                   'size':['XXL','XL','L','S']})

outlist = [(i,j,k)
           for i in df['product'] if i != ''
           for j in df['color'] if j != ''
           for k in df['size']]

newdf = pd.DataFrame(data=outlist,columns=['product','color','size'])
print (newdf)

The new dataframe will be:

   product color size
0       p1   Red  XXL
1       p1   Red   XL
2       p1   Red    L
3       p1   Red    S
4       p1  Blue  XXL
5       p1  Blue   XL
6       p1  Blue    L
7       p1  Blue    S
8       p2   Red  XXL
9       p2   Red   XL
10      p2   Red    L
11      p2   Red    S
12      p2  Blue  XXL
13      p2  Blue   XL
14      p2  Blue    L
15      p2  Blue    S
16      p3   Red  XXL
17      p3   Red   XL
18      p3   Red    L
19      p3   Red    S
20      p3  Blue  XXL
21      p3  Blue   XL
22      p3  Blue    L
23      p3  Blue    S

Using product from itertools

Alternate approach is to use product from itertools

You can do this instead:

import pandas as pd
from itertools import product
df = pd.DataFrame({'product':['p1','p2','p3',''],
                   'color':['Red','Blue','',''],
                   'size':['XXL','XL','L','S']})

print (df)

new_df = pd.DataFrame(data=list(product(df['product'],
                                        df['color'],
                                        df['size'])),
                      columns=['product','color','size'])
new_df.drop(new_df[(new_df['product'] == '') | (new_df['color'] == '')].index, inplace = True)
new_df = new_df.reset_index(drop=True)
print (new_df)

Note here that I have to remove rows that have product = '' or size = '' as the dataframe has these values and we want to ignore them.

The result of this will be:

   product color size
0       p1   Red  XXL
1       p1   Red   XL
2       p1   Red    L
3       p1   Red    S
4       p1  Blue  XXL
5       p1  Blue   XL
6       p1  Blue    L
7       p1  Blue    S
8       p2   Red  XXL
9       p2   Red   XL
10      p2   Red    L
11      p2   Red    S
12      p2  Blue  XXL
13      p2  Blue   XL
14      p2  Blue    L
15      p2  Blue    S
16      p3   Red  XXL
17      p3   Red   XL
18      p3   Red    L
19      p3   Red    S
20      p3  Blue  XXL
21      p3  Blue   XL
22      p3  Blue    L
23      p3  Blue    S

Itertools combinations of multiple columns

Question

2 answers

solution1
1 ACCPTED 2021-02-16 01:47:41

solution2
1 2021-02-16 02:39:43

Using List Comprehension

Using product from itertools

Itertools combinations of multiple columns

Question

2 answers

solution1 1 ACCPTED 2021-02-16 01:47:41

solution2 1 2021-02-16 02:39:43

Using List Comprehension

Using product from itertools

solution1
1 ACCPTED 2021-02-16 01:47:41

solution2
1 2021-02-16 02:39:43