简体   繁体   中英

count in pivot table of pandas dataframe

How do I calcuate the count in the aggregate column of pivot table?

import pandas as pd
from StringIO import StringIO
import numpy as np

audit_trail = """1|2|ENQ-wbrProcess.php|bus_departures|BUS_SERVICE_NO#DEPARTURE_TM|54790#01/12/2010|BOOKING_STATUS|O|L|WBRMWR|2010-12-01 12:42:32
5|0|DTO-transfer.php|bus_services|BUS_SERVICE_NO|159734|BUS_TYPE_CD||DO|PHRTD|2010-12-01 12:43:27
9|0|DTO-transfer.php|bus_services|BUS_SERVICE_NO|159734|EFFECTIVE_FROM||2010-12-02 00:00:00|PHRTD|2010-12-01 12:43:28
13|0|DTO-transfer.php|bus_services|BUS_SERVICE_NO|159734|MAX_CHANCE_SEATS||0|PHRTD|2010-12-01 12:43:28
17|0|DTO-transfer.php|bus_services|BUS_SERVICE_NO|159734|SCHEDULED_NO||15|PHRTD|2010-12-01 12:43:29
21|0|DTO-transfer.php|bus_services|BUS_SERVICE_NO|159734|TRIP_NATURE||Basic|PHRTD|2010-12-01 12:43:29
25|0|DTO-transfer.php|bus_services|BUS_SERVICE_NO|159734|PARCEL_SERVICE||N|PHRTD|2010-12-01 12:43:30
29|0|DTO-transfer.php|bus_services|BUS_SERVICE_NO|159734|TRIP_NO||S11308|PHRTD|2010-12-01 12:43:30
33|0|DTO-transfer.php|bus_services|BUS_SERVICE_NO|159734|IS_AVL_RESERVATION||N|PHRTD|2010-12-01 12:43:31
37|0|DTO-transfer.php|bus_service_seats|BUS_SERVICE_NO|159734|BUS_SERVICE_NO||159734|PHRTD|2010-12-01 12:43:32"""

col_list = ['transaction_id', 'request_id', 'table_name', 'table_unique_field', 'table_unique_value', 'field_name', 'old_value', 'new_value', 'client_id', 'client_type', 'transaction_date']
audit = pd.read_csv(StringIO(audit_trail), sep="|" , names = col_list, index_col='transaction_date' )
pd.pivot_table(audit, values='transaction_id', rows=['table_name'], cols=['table_unique_field'], aggfunc=np.sum)

The result is as follows:

table_unique_field  bus_departures  bus_service_seats  bus_services
table_name
DTO-transfer.php               NaN                 37           152
ENQ-wbrProcess.php               1                NaN           NaN

The above is correctly showing the sum total of transaction_id column. I need the count and not sum. The aggregate function np.count does not seem to work. Expected results:

table_unique_field  bus_departures  bus_service_seats  bus_services
table_name
DTO-transfer.php               NaN                 1           8
ENQ-wbrProcess.php               1                NaN           NaN

Using len or 'count' as argument for aggfunc does work:

In [11]: pd.pivot_table(audit, values='transaction_id', index=['table_name'],
                        columns=['table_unique_field'], aggfunc='count')
Out[11]:
table_unique_field  bus_departures  bus_service_seats  bus_services
table_name
DTO-transfer.php               NaN                  1             8
ENQ-wbrProcess.php               1                NaN           NaN

Note: also better to use index/columns instead of rows/cols as these are deprecated and will be removed in a future version (unless you are using an older pandas version where this was not yet introduced)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM