简体   繁体   中英

Pandas Pivot table, how to put a series of columns in the values attribute

First of all, I apologize! It's my first time using stack overflow so I hope I'm doing it right! I searched but can't find what I'm looking for. I'm also quite new with pandas and python :) I am going to try to use an example and I will try to be clear.

I have a dataframe with 30 columns that contains information about a shopping cart, 1 of the columns (order) have 2 values, either completed of in progress. And I have like 20 columns with items, lets say apple, orange, bananas... And I need to know how many times there is an apple in a complete order and how many in a in progress order. I decided to use a pivot table with the aggregate function count. This would be a small example of the dataframe:

Order      | apple | orange | banana | pear | pineapple | ...  |
-----------|-------|--------|--------|------|-----------|------|
completed  |   2   |    4   |   10   |   5  |    1      |      |
completed  |   5   |    4   |   5    |   8  |    3      |      |
iProgress  |   3   |    7   |   6    |   5  |    2      |      |
completed  |   6   |    3   |   1    |   7  |    1      |      |
iProgress  |   10  |    2   |   2    |   2  |    2      |      |
completed  |   2   |    1   |   4    |   8  |    1      |      |

I have the output I want but what I'm looking for is a more elegant way of selecting lots of columns without having to type them manually.

df.pivot_table(index=['Order'], values=['apple', 'bananas', 'orange', 'pear', 'strawberry',
   'mango'], aggfunc='count')

But I want to select around 15 columns, so instead of typing one by one 15 times, I'm sure there is an easy way of doing it by using column numbers or something. Let's say I want to select columns from 6 till 15.

I have tried with things like values=[df.columns[6:15]], I have also tried using df.iloc, but as I said, I'm pretty new so I'm probably using things wrong or making silly things!

Is there also a way to get them in the order they have? Because in my answer they seem to have been ordered alphabetically and I want to keep the order of the columns. So it should be apple, orange, banana...

Order        Completed    In progress  
apple          92             221
banana         102            144
mango          70             55

I'm just looking for a way of improving my code and I hope I have not made much mess. Thank you!

I think you can use:

#if need select only few columns - df.columns[1:3]
df = df.pivot_table(columns=['Order'], values=df.columns[1:3], aggfunc='count')
print (df)
Order   completed  iProgress
apple           4          2
orange          4          2

#if need use all column, parameter values can be omit
df = df.pivot_table(columns=['Order'], aggfunc='count')
print (df)

Order      completed  iProgress
apple              4          2
banana             4          2
orange             4          2
pear               4          2
pineapple          4          2

What is the difference between size and count in pandas?

df = df.pivot_table(columns=['Order'], aggfunc=len)
print (df)
Order      completed  iProgress
apple              4          2
banana             4          2
orange             4          2
pear               4          2
pineapple          4          2

#solution with groupby and transpose
df = df.groupby('Order').count().T
print (df)
Order      completed  iProgress
apple              4          2
orange             4          2
banana             4          2
pear               4          2
pineapple          4          2

Your example doesn't show an example of an item not in the cart. I'm assuming it comes up as None or 0 . If this is correct, then I fill na values and count how many are greater than 0

df.set_index('Order').fillna(0).gt(0).groupby(level='Order').sum().T

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM