简体   繁体   English

计数和分组依据 - Pandas Dataframe

[英]Count and Group By - Pandas Dataframe

I have a dataframe, csv_table that looks like this:我有一个 dataframe, csv_table看起来像这样:

|  time |        ID        | range |                        text                        |
|:-----:|:----------------:|:-----:|:--------------------------------------------------:|
| 90000 | B0A0F80A06A3AB6C |   0   | In what year did baseball become an offical sport? |
| 90000 | 95A33E619934A39B |   0   |              wirehair pointing griffon             |
| 90000 | E613C21C535BC636 |   30  |                        ncic                        |
| 90000 | 687340036669C45D |   0   |                 kitchen appliances                 |
| 90000 | E43DD6D82BFBD0B8 |   0   |         where can I find a chines rosewood         |
| 90000 | CA52ECD1524E737D |   0   |             jennifer  love hewitt naked            |
| 90000 | 2B4FAF545C0E6EF0 |   40  |                    pageant trim                    |
| 90000 | 6456584F5B316AAE |  100  |                  tiger electronics     
  |

(The file actually goes on for about ~300K entries) (该文件实际上持续了大约 30 万个条目)

What I am trying to do is figure out the average number of entries by ID.我想做的是按 ID 计算平均条目数。

In SQL, I would do something like:在 SQL 中,我会执行以下操作:

WITH
    Counts AS (
        SELECT
            COUNT(text) AS TheCnt,
            ID

        FROM    
            csv_table

        GROUP BY
            ID
    ),

    Tots AS (

        SELECT
            AVG(TheCnt) AS TheAvg

        FROM
            Counts
    )

    SELECT * FROM Tots

I tried writing some Python codes to achieve the same result:我尝试编写一些 Python 代码以达到相同的结果:

import pandas as pd

tsv_file = "filepath"
csv_table=pd.read_csv(tsv_file, sep='\t', header=None)
csv_table.columns = ['time', 'ID', 'range', 'text']

val = csv_table.groupby('ID').count()
print(val)

But I get:但我得到:

                  time  range  text
ID
0000177584E874EC     1      1     1
00006291C83E2C2A     2      2     2
00006FD94F3A9CB4     1      1     1
000087A6525FEED2     4      4     4

How can I achieve my desired result?我怎样才能达到我想要的结果? I am obviously counting the # of text per user, but then to find the average of text?我显然是在计算每个用户的文本数量,但是要找到文本的平均值?

I'm assuming you just want 1 final number right?我假设您只想要 1 个最终数字,对吗? If so then it's just:如果是这样,那么它只是:

val['text'].mean()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM