简体   繁体   中英

Need to aggregate count(rowid, colid) on dataframe in pandas

I've been trying to turn this

| row_id | col_id |
|--------|--------|
|   1    |   23   |
|   4    |   45   |
|  ...   |  ...   |
|   1    |   23   |
|  ...   |  ...   |
|   4    |   45   |
|  ...   |  ...   |
|   4    |   45   |
|  ...   |  ...   |

Into this

| row_id | col_id |  count  |
|--------|--------|---------|
|   1    |   23   |    2    |
|   4    |   45   |    3    |
|  ...   |  ...   |   ...   |

So all (row_i, col_j) occurrences are added into the 'count' column. Note that row_id and column_id won't be unique in any of both cases.

Now success until now, at least if I want to keep being efficient. I can iterate over each pair and add up occurrences, but there has to be a simpler way in pandas—or numpy for that matter.

Thanks!

EDIT 1:

As @j-bradley suggested, I tried the following

# I use django-pandas
rdf = Record.objects.to_dataframe(['row_id', 'column_id'])
_ = rdf.groupby(['row_id', 'column_id'])['row_id'].count().head(20)
_.head(10)

And that outputs

    row_id  column_id
1       108          1
        168          1
        218          1
        398          2
        422          1
10      35           2
        355          1
        489          1
100     352          1
        366          1
Name: row_id, dtype: int64

This seems ok. But it's a Series object and I'm not sure how to turn this into a dataframe with the required three columns. Pandas noob, as it seems. Any tips?

Thanks again.

you can group by columns a and b and call count on the group by object:

df =pd.DataFrame({'A':[1,4,1,4,4], 'B':[23,45,23,45,45]})
df.groupby(['A','B'])['A'].count()

returns:

A  B 
1  23    2
4  45    3

Edited to make the answer more explicit

To turn the series back to a dataframe with a column named count:

_ = df.groupby(['A','B'])['A'].count()

the name of the series becomes the column name:

_.name = 'Count'

resetting the index, promotes the multi-index to columns and turns the series into a dataframe:

df =_.reset_index()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM