简体   繁体   中英

python pandas pivot_table count frequency in one column

I am still new to Python pandas' pivot_table and would like to ask a way to count frequencies of values in one column, which is also linked to another column of ID. The DataFrame looks like the following.

import pandas as pd
df = pd.DataFrame({'Account_number':[1,1,2,2,2,3,3],
                   'Product':['A', 'A', 'A', 'B', 'B','A', 'B']
                  })

For the output, I'd like to get something like the following:

                Product
                A      B
Account_number           
      1         2      0
      2         1      2
      3         1      1

So far, I tried this code:

df.pivot_table(rows = 'Account_number', cols= 'Product', aggfunc='count')

This code gives me the two same things. What is the problems with the code above? A part of the reason why I am asking this question is that this DataFrame is just an example. The real data that I am working on has tens of thousands of account_numbers.

You need to specify the aggfunc as len :

In [11]: df.pivot_table(index='Account_number', columns='Product', 
                        aggfunc=len, fill_value=0)
Out[11]:
Product         A  B
Account_number
1               2  0
2               1  2
3               1  1

It looks like count, is counting the instances of each column ( Account_number and Product ), it's not clear to me whether this is a bug...

Solution:<\/strong> Use aggfunc='size'<\/code>

By default, pandas will apply this aggfunc<\/code> to all the columns not found in index<\/code> or columns<\/code> parameters.

df = pd.DataFrame({'Account_number':[1, 1, 2 ,2 ,2 ,3 ,3], 
                   'Product':['A', 'A', 'A', 'B', 'B','A', 'B'], 
                   'Price': [10] * 7,
                   'Quantity': [100] * 7})

In new version of Pandas, slight modification is required. I had to spend some time figuring out so just wanted to add that here so that someone can directly use this.

df.pivot_table(index='Account_number', columns='Product', aggfunc=len,
               fill_value=0)

您可以使用count<\/code> df.pivot_table(index='Account_number', columns='Product', aggfunc='count')<\/code>

"

I know this question is about pivot_table but for the problem given in the question, we can use crosstab :

out = pd.crosstab(df['Account_number'], df['Product'])

Output:

Product         A  B
Account_number      
1               2  0
2               1  2
3               1  1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM