简体   繁体   English

多列的itertools组合

[英]Itertools combinations of multiple columns

I have this data我有这个数据

product color size
p1      Red   XXL
p2      Blue  XL
p3            L
              S

I want to make combinations from the columns as follows:我想从列中进行组合,如下所示:

p1, Red, XXL
p1, Red, XL
.
.
p3, Blue, S

I tried make all columns in one list then use itertools.combinations but the result contains some unwanted data like:我尝试在一个列表中创建所有列,然后使用 itertools.combinations 但结果包含一些不需要的数据,例如:

p1, p2, p3
OR
Red, Blue, XXL
OR
XXL, XL, S ....

my code is:我的代码是:

df = read_csv('./GenerateProducts.csv', delimiter=',')
df_columns = df.columns.tolist()
list_data = DataFrame()
for i in df_columns:
    list_data = concat([list_data,df[i].dropna(axis=0)])
generated_products = DataFrame( combinations( list_data[0] ,len(df_columns) ) )

I am also trying to make it dynamic I tried to make the columns into dict then use keys as pointer for the data but I did not know how to implement this logic, my experience with dict is too shallow我也试图让它动态我试图将列变成 dict 然后使用键作为数据的指针但我不知道如何实现这个逻辑,我对 dict 的经验太浅了

data = dict()
for i in df_columns:
    data[i] = df[i].dropna(axis=0)

I read a bout the itertools.product and that is why I made the dict also to use for loop to make same changes using the dict keys.我阅读了关于 itertools.product 的文章,这就是为什么我制作 dict 也使用 for 循环来使用 dict 键进行相同的更改。

I think my execution with dict got me to confuse myself, any guidance我认为我对 dict 的执行让我感到困惑,任何指导

EDIT:编辑:

I got it to work我让它工作

temp = []
for i in df_columns:
    temp += [data[i]]
    
final_df = DataFrame(product(*temp), columns=df_columns)
final_df

I am wondering is there more efficient way to accomplish the same result我想知道是否有更有效的方法来完成相同的结果

Thank you谢谢

Yes, there is a more efficient way, using itertools.product() :是的,有一种更有效的方法,使用itertools.product()

import itertools

prod1 = ['p1', 'p2', 'p3']
color1 = ['Red', 'Blue']
size1 = ['XXL', 'XL', 'L', 'S']
t1 = itertools.product(prod1, color1, size1)
for t in t1:
    print(t)

Output Output

('p1', 'Red', 'XXL')
('p1', 'Red', 'XL')
('p1', 'Red', 'L')
('p1', 'Red', 'S')
('p1', 'Blue', 'XXL')
('p1', 'Blue', 'XL')
('p1', 'Blue', 'L')
('p1', 'Blue', 'S')
('p2', 'Red', 'XXL')
('p2', 'Red', 'XL')
('p2', 'Red', 'L')
('p2', 'Red', 'S')
('p2', 'Blue', 'XXL')
('p2', 'Blue', 'XL')
('p2', 'Blue', 'L')
('p2', 'Blue', 'S')
('p3', 'Red', 'XXL')
('p3', 'Red', 'XL')
('p3', 'Red', 'L')
('p3', 'Red', 'S')
('p3', 'Blue', 'XXL')
('p3', 'Blue', 'XL')
('p3', 'Blue', 'L')
('p3', 'Blue', 'S')

I assume your data is stored as follows in your dataframe (5 rows x 3 columns).我假设您的数据按如下方式存储在 dataframe(5 行 x 3 列)中。

  product color size
0      p1   Red  XXL
1      p2  Blue   XL
2      p3          L
3                  S

Using List Comprehension使用列表理解

You want to create a dataframe with a combination of each of these.您想创建一个 dataframe 并结合其中的每一个。 You can do this using a list comprehension and then creating a dataframe from the results.您可以使用列表理解来执行此操作,然后从结果中创建 dataframe。

Here's how to do it.这是如何做到的。

import pandas as pd
df = pd.DataFrame({'product':['p1','p2','p3',''],
                   'color':['Red','Blue','',''],
                   'size':['XXL','XL','L','S']})

outlist = [(i,j,k)
           for i in df['product'] if i != ''
           for j in df['color'] if j != ''
           for k in df['size']]

newdf = pd.DataFrame(data=outlist,columns=['product','color','size'])
print (newdf)

The new dataframe will be:新的 dataframe 将是:

   product color size
0       p1   Red  XXL
1       p1   Red   XL
2       p1   Red    L
3       p1   Red    S
4       p1  Blue  XXL
5       p1  Blue   XL
6       p1  Blue    L
7       p1  Blue    S
8       p2   Red  XXL
9       p2   Red   XL
10      p2   Red    L
11      p2   Red    S
12      p2  Blue  XXL
13      p2  Blue   XL
14      p2  Blue    L
15      p2  Blue    S
16      p3   Red  XXL
17      p3   Red   XL
18      p3   Red    L
19      p3   Red    S
20      p3  Blue  XXL
21      p3  Blue   XL
22      p3  Blue    L
23      p3  Blue    S

Using product from itertools使用来自 itertools 的产品

Alternate approach is to use product from itertools另一种方法是使用来自itertoolsproduct

You can do this instead:你可以这样做:

import pandas as pd
from itertools import product
df = pd.DataFrame({'product':['p1','p2','p3',''],
                   'color':['Red','Blue','',''],
                   'size':['XXL','XL','L','S']})

print (df)

new_df = pd.DataFrame(data=list(product(df['product'],
                                        df['color'],
                                        df['size'])),
                      columns=['product','color','size'])
new_df.drop(new_df[(new_df['product'] == '') | (new_df['color'] == '')].index, inplace = True)
new_df = new_df.reset_index(drop=True)
print (new_df)

Note here that I have to remove rows that have product = '' or size = '' as the dataframe has these values and we want to ignore them.请注意,我必须删除具有product = ''size = ''行,因为 dataframe 具有这些值,我们想忽略它们。

The result of this will be:结果将是:

   product color size
0       p1   Red  XXL
1       p1   Red   XL
2       p1   Red    L
3       p1   Red    S
4       p1  Blue  XXL
5       p1  Blue   XL
6       p1  Blue    L
7       p1  Blue    S
8       p2   Red  XXL
9       p2   Red   XL
10      p2   Red    L
11      p2   Red    S
12      p2  Blue  XXL
13      p2  Blue   XL
14      p2  Blue    L
15      p2  Blue    S
16      p3   Red  XXL
17      p3   Red   XL
18      p3   Red    L
19      p3   Red    S
20      p3  Blue  XXL
21      p3  Blue   XL
22      p3  Blue    L
23      p3  Blue    S

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM