I have this data
product color size
p1 Red XXL
p2 Blue XL
p3 L
S
I want to make combinations from the columns as follows:
p1, Red, XXL
p1, Red, XL
.
.
p3, Blue, S
I tried make all columns in one list then use itertools.combinations but the result contains some unwanted data like:
p1, p2, p3 OR Red, Blue, XXL OR XXL, XL, S ....
my code is:
df = read_csv('./GenerateProducts.csv', delimiter=',')
df_columns = df.columns.tolist()
list_data = DataFrame()
for i in df_columns:
list_data = concat([list_data,df[i].dropna(axis=0)])
generated_products = DataFrame( combinations( list_data[0] ,len(df_columns) ) )
I am also trying to make it dynamic I tried to make the columns into dict then use keys as pointer for the data but I did not know how to implement this logic, my experience with dict is too shallow
data = dict()
for i in df_columns:
data[i] = df[i].dropna(axis=0)
I read a bout the itertools.product and that is why I made the dict also to use for loop to make same changes using the dict keys.
I think my execution with dict got me to confuse myself, any guidance
EDIT:
I got it to work
temp = []
for i in df_columns:
temp += [data[i]]
final_df = DataFrame(product(*temp), columns=df_columns)
final_df
I am wondering is there more efficient way to accomplish the same result
Thank you
Yes, there is a more efficient way, using itertools.product()
:
import itertools
prod1 = ['p1', 'p2', 'p3']
color1 = ['Red', 'Blue']
size1 = ['XXL', 'XL', 'L', 'S']
t1 = itertools.product(prod1, color1, size1)
for t in t1:
print(t)
Output
('p1', 'Red', 'XXL')
('p1', 'Red', 'XL')
('p1', 'Red', 'L')
('p1', 'Red', 'S')
('p1', 'Blue', 'XXL')
('p1', 'Blue', 'XL')
('p1', 'Blue', 'L')
('p1', 'Blue', 'S')
('p2', 'Red', 'XXL')
('p2', 'Red', 'XL')
('p2', 'Red', 'L')
('p2', 'Red', 'S')
('p2', 'Blue', 'XXL')
('p2', 'Blue', 'XL')
('p2', 'Blue', 'L')
('p2', 'Blue', 'S')
('p3', 'Red', 'XXL')
('p3', 'Red', 'XL')
('p3', 'Red', 'L')
('p3', 'Red', 'S')
('p3', 'Blue', 'XXL')
('p3', 'Blue', 'XL')
('p3', 'Blue', 'L')
('p3', 'Blue', 'S')
I assume your data is stored as follows in your dataframe (5 rows x 3 columns).
product color size
0 p1 Red XXL
1 p2 Blue XL
2 p3 L
3 S
You want to create a dataframe with a combination of each of these. You can do this using a list comprehension and then creating a dataframe from the results.
Here's how to do it.
import pandas as pd
df = pd.DataFrame({'product':['p1','p2','p3',''],
'color':['Red','Blue','',''],
'size':['XXL','XL','L','S']})
outlist = [(i,j,k)
for i in df['product'] if i != ''
for j in df['color'] if j != ''
for k in df['size']]
newdf = pd.DataFrame(data=outlist,columns=['product','color','size'])
print (newdf)
The new dataframe will be:
product color size
0 p1 Red XXL
1 p1 Red XL
2 p1 Red L
3 p1 Red S
4 p1 Blue XXL
5 p1 Blue XL
6 p1 Blue L
7 p1 Blue S
8 p2 Red XXL
9 p2 Red XL
10 p2 Red L
11 p2 Red S
12 p2 Blue XXL
13 p2 Blue XL
14 p2 Blue L
15 p2 Blue S
16 p3 Red XXL
17 p3 Red XL
18 p3 Red L
19 p3 Red S
20 p3 Blue XXL
21 p3 Blue XL
22 p3 Blue L
23 p3 Blue S
Alternate approach is to use product
from itertools
You can do this instead:
import pandas as pd
from itertools import product
df = pd.DataFrame({'product':['p1','p2','p3',''],
'color':['Red','Blue','',''],
'size':['XXL','XL','L','S']})
print (df)
new_df = pd.DataFrame(data=list(product(df['product'],
df['color'],
df['size'])),
columns=['product','color','size'])
new_df.drop(new_df[(new_df['product'] == '') | (new_df['color'] == '')].index, inplace = True)
new_df = new_df.reset_index(drop=True)
print (new_df)
Note here that I have to remove rows that have product = ''
or size = ''
as the dataframe has these values and we want to ignore them.
The result of this will be:
product color size
0 p1 Red XXL
1 p1 Red XL
2 p1 Red L
3 p1 Red S
4 p1 Blue XXL
5 p1 Blue XL
6 p1 Blue L
7 p1 Blue S
8 p2 Red XXL
9 p2 Red XL
10 p2 Red L
11 p2 Red S
12 p2 Blue XXL
13 p2 Blue XL
14 p2 Blue L
15 p2 Blue S
16 p3 Red XXL
17 p3 Red XL
18 p3 Red L
19 p3 Red S
20 p3 Blue XXL
21 p3 Blue XL
22 p3 Blue L
23 p3 Blue S
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.