计算列表中项目的频率

Question

I wanna calculate frequency of accidents in every region, in every year. 我想计算每年每个地区的事故发生频率。 How can I do that using Python. 如何使用Python做到这一点。

file.csv FILE.CSV

Region,Year
1,2003
1,2003
2,2008
2,2007
2,2007
3,2004
1,2004
1,2004
1,2004

I tried using Counter, but it works only with one columns. 我尝试使用Counter，但仅适用于一列。 Example: In region 1, year 2003 , there are 2 So results should be: 示例：在2003年的区域1中，有2个，因此结果应为：

 Region,Year, freq
    1,2003,2
    1,2003,2
    2,2008,1
    2,2007,2
    2,2007,2
    3,2004,1
    1,2004,3
    1,2004,3
    1,2004,3

I tried doing it this way. 我尝试过这种方式。 But it doesn't seem to be the right way. 但这似乎不是正确的方法。

from collections import Counter

data = pandas.DataFrame("file.csv")
freq_year= Counter(data.year.values)
dz = [dom[x] for x in data.year.values]
data["freq"] = data["year"].apply(lambda x: dom[x])

I am thinking of using Groupby. 我正在考虑使用Groupby。 Do you know any idea how to do this ? 你知道怎么做吗？

Answer 1

Not a pandas solution, but gets the job done: 不是pandas解决方案，但可以完成工作：

import csv
from collections import Counter

inputs = []
with open('input.csv') as csvfile:
   reader = csv.reader(csvfile)
   for row in reader:
       inputs.append(tuple(row))

freqs = Counter(inputs[1:])
print freqs 
# Counter({('1', '2004'): 3, ('1', '2003'): 2, ('2', '2007'): 2, ('2', '2008'): 1, ('3', '2004'): 1})

The key here is to have the values as tuples so that Counter will find them equal. 这里的关键是将值作为元组，以便Counter会发现它们相等。

Answer 2

There might be a better way, but I first append a dummy column and calculate the freq based on the column, like: 也许有更好的方法，但是我首先附加一个虚拟列，然后根据该列计算freq ，例如：

df["freq"] = 1
df["freq"] = df.groupby(["Year", "Region"]).transform(lambda x: x.sum())

This returns the following df: 这将返回以下df：

  Region  Year  freq
0       1  2003     2
1       1  2003     2
2       2  2008     1
3       2  2007     2
4       2  2007     2
5       3  2004     1
6       1  2004     3
7       1  2004     3
8       1  2004     3

计算列表中项目的频率

问题描述

2 个解决方案

解决方案1
1 2014-04-11 23:24:06

解决方案2
1 已采纳 2014-04-11 23:33:47

计算列表中项目的频率

问题描述

2 个解决方案

解决方案1 1 2014-04-11 23:24:06

解决方案2 1 已采纳 2014-04-11 23:33:47

解决方案1
1 2014-04-11 23:24:06

解决方案2
1 已采纳 2014-04-11 23:33:47