[英]Calculate Frequency of item in list
I wanna calculate frequency of accidents in every region, in every year. 我想计算每年每个地区的事故发生频率。 How can I do that using Python.
如何使用Python做到这一点。
file.csv FILE.CSV
Region,Year
1,2003
1,2003
2,2008
2,2007
2,2007
3,2004
1,2004
1,2004
1,2004
I tried using Counter, but it works only with one columns. 我尝试使用Counter,但仅适用于一列。 Example: In region 1, year 2003 , there are 2 So results should be:
示例:在2003年的区域1中,有2个,因此结果应为:
Region,Year, freq
1,2003,2
1,2003,2
2,2008,1
2,2007,2
2,2007,2
3,2004,1
1,2004,3
1,2004,3
1,2004,3
I tried doing it this way. 我尝试过这种方式。 But it doesn't seem to be the right way.
但这似乎不是正确的方法。
from collections import Counter
data = pandas.DataFrame("file.csv")
freq_year= Counter(data.year.values)
dz = [dom[x] for x in data.year.values]
data["freq"] = data["year"].apply(lambda x: dom[x])
I am thinking of using Groupby. 我正在考虑使用Groupby。 Do you know any idea how to do this ?
你知道怎么做吗?
Not a pandas
solution, but gets the job done: 不是
pandas
解决方案,但可以完成工作:
import csv
from collections import Counter
inputs = []
with open('input.csv') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
inputs.append(tuple(row))
freqs = Counter(inputs[1:])
print freqs
# Counter({('1', '2004'): 3, ('1', '2003'): 2, ('2', '2007'): 2, ('2', '2008'): 1, ('3', '2004'): 1})
The key here is to have the values as tuples so that Counter
will find them equal. 这里的关键是将值作为元组,以便
Counter
会发现它们相等。
There might be a better way, but I first append a dummy column and calculate the freq
based on the column, like: 也许有更好的方法,但是我首先附加一个虚拟列,然后根据该列计算
freq
,例如:
df["freq"] = 1
df["freq"] = df.groupby(["Year", "Region"]).transform(lambda x: x.sum())
This returns the following df: 这将返回以下df:
Region Year freq
0 1 2003 2
1 1 2003 2
2 2 2008 1
3 2 2007 2
4 2 2007 2
5 3 2004 1
6 1 2004 3
7 1 2004 3
8 1 2004 3
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.