I have a pandas data frame similar to:
ColA ColB
1 1
1 1
1 1
1 2
1 2
2 1
3 2
I want an output that has the same function as Counter . I need to know how many time each row appears (with all of the columns being the same.
In this case the proper output would be:
ColA ColB Count
1 1 3
1 2 2
2 1 1
3 2 1
I have tried something of the sort:
df.groupby(['ColA','ColB']).ColA.count()
but this gives me some ugly output I am having trouble formatting
You can use size
with reset_index
:
print df.groupby(['ColA','ColB']).size().reset_index(name='Count')
ColA ColB Count
0 1 1 3
1 1 2 2
2 2 1 1
3 3 2 1
I only needed to count the unique rows and have used the DataFrame.drop_duplicates
alternative as below:
len(df[['ColA', 'ColB']].drop_duplicates())
It was twice as fast on my data than len(df.groupby(['ColA', 'ColB']))
.
Since Pandas 1.1.0 the method pandas.DataFrame.value_counts
is available, which does exactly, what you need. It creates a Series with the unique rows as multi-index and the counts as values:
df = pd.DataFrame({'ColA': [1, 1, 1, 1, 1, 2, 3], 'ColB': [1, 1, 1, 2, 2, 1, 2]})
pd.options.display.multi_sparse = False # option to print as requested
print(df.value_counts()) # requires pandas >= 1.1.0
Output, where ColA
and ColB
are the multi-index and the third column contains the counts:
ColA ColB
1 1 3
1 2 2
3 2 1
2 1 1
dtype: int64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.