简体   繁体   中英

Count and Group By - Pandas Dataframe

I have a dataframe, csv_table that looks like this:

|  time |        ID        | range |                        text                        |
|:-----:|:----------------:|:-----:|:--------------------------------------------------:|
| 90000 | B0A0F80A06A3AB6C |   0   | In what year did baseball become an offical sport? |
| 90000 | 95A33E619934A39B |   0   |              wirehair pointing griffon             |
| 90000 | E613C21C535BC636 |   30  |                        ncic                        |
| 90000 | 687340036669C45D |   0   |                 kitchen appliances                 |
| 90000 | E43DD6D82BFBD0B8 |   0   |         where can I find a chines rosewood         |
| 90000 | CA52ECD1524E737D |   0   |             jennifer  love hewitt naked            |
| 90000 | 2B4FAF545C0E6EF0 |   40  |                    pageant trim                    |
| 90000 | 6456584F5B316AAE |  100  |                  tiger electronics     
  |

(The file actually goes on for about ~300K entries)

What I am trying to do is figure out the average number of entries by ID.

In SQL, I would do something like:

WITH
    Counts AS (
        SELECT
            COUNT(text) AS TheCnt,
            ID

        FROM    
            csv_table

        GROUP BY
            ID
    ),

    Tots AS (

        SELECT
            AVG(TheCnt) AS TheAvg

        FROM
            Counts
    )

    SELECT * FROM Tots

I tried writing some Python codes to achieve the same result:

import pandas as pd

tsv_file = "filepath"
csv_table=pd.read_csv(tsv_file, sep='\t', header=None)
csv_table.columns = ['time', 'ID', 'range', 'text']

val = csv_table.groupby('ID').count()
print(val)

But I get:

                  time  range  text
ID
0000177584E874EC     1      1     1
00006291C83E2C2A     2      2     2
00006FD94F3A9CB4     1      1     1
000087A6525FEED2     4      4     4

How can I achieve my desired result? I am obviously counting the # of text per user, but then to find the average of text?

I'm assuming you just want 1 final number right? If so then it's just:

val['text'].mean()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM