[英]Count and Group By - Pandas Dataframe
I have a dataframe, csv_table
that looks like this:我有一个 dataframe, csv_table
看起来像这样:
| time | ID | range | text |
|:-----:|:----------------:|:-----:|:--------------------------------------------------:|
| 90000 | B0A0F80A06A3AB6C | 0 | In what year did baseball become an offical sport? |
| 90000 | 95A33E619934A39B | 0 | wirehair pointing griffon |
| 90000 | E613C21C535BC636 | 30 | ncic |
| 90000 | 687340036669C45D | 0 | kitchen appliances |
| 90000 | E43DD6D82BFBD0B8 | 0 | where can I find a chines rosewood |
| 90000 | CA52ECD1524E737D | 0 | jennifer love hewitt naked |
| 90000 | 2B4FAF545C0E6EF0 | 40 | pageant trim |
| 90000 | 6456584F5B316AAE | 100 | tiger electronics
|
(The file actually goes on for about ~300K entries) (该文件实际上持续了大约 30 万个条目)
What I am trying to do is figure out the average number of entries by ID.我想做的是按 ID 计算平均条目数。
In SQL, I would do something like:在 SQL 中,我会执行以下操作:
WITH
Counts AS (
SELECT
COUNT(text) AS TheCnt,
ID
FROM
csv_table
GROUP BY
ID
),
Tots AS (
SELECT
AVG(TheCnt) AS TheAvg
FROM
Counts
)
SELECT * FROM Tots
I tried writing some Python codes to achieve the same result:我尝试编写一些 Python 代码以达到相同的结果:
import pandas as pd
tsv_file = "filepath"
csv_table=pd.read_csv(tsv_file, sep='\t', header=None)
csv_table.columns = ['time', 'ID', 'range', 'text']
val = csv_table.groupby('ID').count()
print(val)
But I get:但我得到:
time range text
ID
0000177584E874EC 1 1 1
00006291C83E2C2A 2 2 2
00006FD94F3A9CB4 1 1 1
000087A6525FEED2 4 4 4
How can I achieve my desired result?我怎样才能达到我想要的结果? I am obviously counting the # of text per user, but then to find the average of text?我显然是在计算每个用户的文本数量,但是要找到文本的平均值?
I'm assuming you just want 1 final number right?我假设您只想要 1 个最终数字,对吗? If so then it's just:如果是这样,那么它只是:
val['text'].mean()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.