简体   繁体   English

Pandas Pivot或Groupby用于动态生成的列

[英]Pandas pivot or groupby for dynamically generated columns

I have a dataframe with sales information in a supermarket. 我在超市有一个带有销售信息的数据框。 Each row in the dataframe represents an item, with several characteristics as columns. 数据框中的每一行代表一个项目,具有多个特征作为列。 The original DataFrame is something like this: 原始的DataFrame是这样的:

In [1]: import pandas as pd
        my_data = [{'ticket_number' : '001', 'item' : 'tomato', 'ticket_price' : '21'},
               {'ticket_number' : '001', 'item' : 'candy', 'ticket_price' : '21'},
               {'ticket_number' : '001', 'item' : 'soup', 'ticket_price' : '21'},
               {'ticket_number' : '002', 'item' : 'soup', 'ticket_price' : '12'},
               {'ticket_number' : '002', 'item' : 'cola', 'ticket_price' : '12'},
               {'ticket_number' : '003', 'item' : 'beef', 'ticket_price' : '56'},
               {'ticket_number' : '003', 'item' : 'tomato', 'ticket_price' : '56'},
               {'ticket_number' : '003', 'item' : 'pork', 'ticket_price' : '56'}]
        df = pd.DataFrame(my_data)

In [2]: df
Out [2]:    
            ticket_number   ticket_price        item
         0        001            21           tomato
         1        001            21           candy
         2        001            21           soup
         3        002            12           soup
         4        002            12           cola
         5        003            56           beef
         6        003            56           tomato
         7        003            56           pork

I need a DataFrame where each row represents a ticket with all the items bought and the ticket price as columns. 我需要一个DataFrame,其中每一行代表一张票,所有购买的物品和票价格作为列。 In this example: 在此示例中:

            ticket_number   ticket_price      item1  item2  item3
         0        001            21           tomato candy  soup
         1        002            12           soup   cola
         2        003            56           beef   tomato pork

I tried using df.groupby(ticket_number).item.value_counts() , but that does not create new columns. 我尝试使用df.groupby(ticket_number).item.value_counts() ,但这不会创建新列。 I have never used pivot_table , maybe it is useful. 我从未使用过pivot_table ,也许它很有用。

Any help would be very appreciated. 任何帮助将不胜感激。

Thanks! 谢谢!

One possible way to use groupby to make lists of it that can then be turned into columns: 一种使用groupby制作其列表的方法,然后可以将其转换为列:

In [24]: res = df.groupby(['ticket_number', 'ticket_price'])['item'].apply(list).apply(pd.Series)

In [25]: res
Out[25]:
                                 0       1     2
ticket_number ticket_price
001           21            tomato   candy  soup
002           12              soup    cola   NaN
003           56              beef  tomato  pork

Then, after cleaning up this result a bit: 然后,清除此结果后:

In [27]: res.columns = ['item' + str(i + 1) for i in res.columns]

In [29]: res.reset_index()
Out[29]:
  ticket_number ticket_price   item1   item2 item3
0           001           21  tomato   candy  soup
1           002           12    soup    cola   NaN
2           003           56    beef  tomato  pork

Another possible way to create a new column which numbers the items in each group with groupby.cumcount : 创建新列的另一种可能方法是用groupby.cumcount对每个组中的项目进行groupby.cumcount

In [38]: df['item_number'] = df.groupby('ticket_number').cumcount()

In [39]: df
Out[39]:
     item ticket_number ticket_price  item_number
0  tomato           001           21            0
1   candy           001           21            1
2    soup           001           21            2
3    soup           002           12            0
4    cola           002           12            1
5    beef           003           56            0
6  tomato           003           56            1
7    pork           003           56            2

And then do some reshaping: 然后进行一些重塑:

In [40]: df.set_index(['ticket_number', 'ticket_price', 'item_number']).unstack(-1)
Out[40]:
                              item
item_number                      0       1     2
ticket_number ticket_price
001           21            tomato   candy  soup
002           12              soup    cola   NaN
003           56              beef  tomato  pork

From here, with some cleaning of the columns names, you can achieve the same as above. 从这里开始,通过一些列名称的清理,您可以实现与上面相同的效果。

The reshaping step with set_index and untack could also be done with pivot_table : df.pivot_table(columns=['item_number'], index=['ticket_number', 'ticket _price'], values='item', aggfunc='first') 使用set_indexuntack重塑的步骤也可以通过pivot_table完成: df.pivot_table(columns=['item_number'], index=['ticket_number', 'ticket _price'], values='item', aggfunc='first')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM