简体   繁体   English

pandas - 从所有可能的列组合中生成一个df

[英]pandas - make a df from all possible combinations of columns

I have this pandas dataframe : 我有这个pandas数据帧:

df = pd.DataFrame([['cat1', 1], ['cat2', 1], ['cat3', 2],
               ['cat1', 3]], columns=['category',
              'number'])

df
Out[32]: 
  category  number
0     cat1       1
1     cat2       1
2     cat3       2
3     cat1       3

the first column represents the category of products purchased by a customer. 第一列表示客户购买的产品类别。 The the second represents the number associated with the purchase of the same Customer. 第二个代表与购买同一客户相关的数字。 So this Customer made 3 purchases. 所以这位客户进行了3次购买。 I want to reshape the table in order to have all the combinations of the categories bought by this customer in first purchase then second and third purchase and a new columns that count the number of combination : 我想重新整理表格,以便在第一次购买然后第二次和第三次购买时获得该客户购买的所有类别的组合以及计算组合数量的新列:

      1     2     3  count
0  cat1  cat3   NaN      1
1  cat2  cat3   NaN      1
2  cat1  cat3  cat1      1
3  cat2  cat3  cat1      1 

I tried to pivot it like this : 我试图像这样转动它:

df.pivot(columns='nb_achat', values='category')

but it did not work because of the combination. 但由于这种组合,它没有用。 Do you have a way to do this ? 你有办法做到这一点吗?

the goal is to know what a customer buys first time then second time and how many customers bought the same category in purchase 1 then 2 (for example) 我们的目标是知道客户第一次购买的是什么,第二次有多少客户购买同一类别,然后购买2个(例如)

EDIT : here an exemple of the result 编辑:这里是结果的例子

result example 结果的例子

df = pd.DataFrame([['cat1', 1], ['cat2', 1], ['cat3', 2],
               ['cat1', 3]], columns=['category',
              'number'])

from itertools import product
result_items = []
product_numbers = df.number.sort_values().unique()
product_numbers = product_numbers[product_numbers >= 2]

# get all the combinations of results for all the product numbers
for number in product_numbers:

    purchase_history = []
    for hist in range(1, number+1):
        purchase_history.append(df.category[df.number == hist].tolist())

    for item in product(*purchase_history):

        item_store = {}
        for i in range(1, number+1):
            item_store[i] = item[i-1]

        result_items.append(item_store)

# put them all into a dataframe
results = pd.DataFrame(result_items)
results.fillna(0, inplace=True)
# get the counts of all history
results = results.groupby(results.columns.tolist()).size().reset_index(name='count')
# fix the NaN values
results.where(results!=0, np.nan, inplace=True)
print(results)

Results are: 结果是:

      1     2     3  count
0  cat1  cat3   NaN      1
1  cat1  cat3  cat1      1
2  cat2  cat3   NaN      1
3  cat2  cat3  cat1      1

This itertools solution isn't particularly elegant. 这个itertools解决方案并不是特别优雅。 I'd love to see if someone can do this without that messy for loop!! 我很想看看有没有人可以做到这一点,没有那个凌乱的循环!!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM