计数和分组依据 - Pandas Dataframe

Question

I have a dataframe, csv_table that looks like this:我有一个 dataframe， csv_table看起来像这样：

|  time |        ID        | range |                        text                        |
|:-----:|:----------------:|:-----:|:--------------------------------------------------:|
| 90000 | B0A0F80A06A3AB6C |   0   | In what year did baseball become an offical sport? |
| 90000 | 95A33E619934A39B |   0   |              wirehair pointing griffon             |
| 90000 | E613C21C535BC636 |   30  |                        ncic                        |
| 90000 | 687340036669C45D |   0   |                 kitchen appliances                 |
| 90000 | E43DD6D82BFBD0B8 |   0   |         where can I find a chines rosewood         |
| 90000 | CA52ECD1524E737D |   0   |             jennifer  love hewitt naked            |
| 90000 | 2B4FAF545C0E6EF0 |   40  |                    pageant trim                    |
| 90000 | 6456584F5B316AAE |  100  |                  tiger electronics     
  |

(The file actually goes on for about ~300K entries) （该文件实际上持续了大约 30 万个条目）

What I am trying to do is figure out the average number of entries by ID.我想做的是按 ID 计算平均条目数。

In SQL, I would do something like:在 SQL 中，我会执行以下操作：

WITH
    Counts AS (
        SELECT
            COUNT(text) AS TheCnt,
            ID

        FROM    
            csv_table

        GROUP BY
            ID
    ),

    Tots AS (

        SELECT
            AVG(TheCnt) AS TheAvg

        FROM
            Counts
    )

    SELECT * FROM Tots

I tried writing some Python codes to achieve the same result:我尝试编写一些 Python 代码以达到相同的结果：

import pandas as pd

tsv_file = "filepath"
csv_table=pd.read_csv(tsv_file, sep='\t', header=None)
csv_table.columns = ['time', 'ID', 'range', 'text']

val = csv_table.groupby('ID').count()
print(val)

But I get:但我得到：

                  time  range  text
ID
0000177584E874EC     1      1     1
00006291C83E2C2A     2      2     2
00006FD94F3A9CB4     1      1     1
000087A6525FEED2     4      4     4

How can I achieve my desired result?我怎样才能达到我想要的结果？ I am obviously counting the # of text per user, but then to find the average of text?我显然是在计算每个用户的文本数量，但是要找到文本的平均值？

Answer 1

I'm assuming you just want 1 final number right?我假设您只想要 1 个最终数字，对吗？ If so then it's just:如果是这样，那么它只是：

val['text'].mean()

计数和分组依据 - Pandas Dataframe

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-11-03 05:05:50

计数和分组依据 - Pandas Dataframe

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-11-03 05:05:50

解决方案1
2 已采纳 2019-11-03 05:05:50