简体   繁体   中英

Getting top "x" values of column by Year

I'm using data.table in R and I'm trying to extract the top 2 highest values in total demand, based on each FinYear.

My actual data set is huge (goes up to FinYear 2016), so i've got a subset beneath.

This is a subset of my data:

    x
    totaldemand FinYear
 1:    4708.667    2000
 2:    4448.833    2000
 3:    4360.025    2000
 4:    4523.167    2000
 5:    4504.558    2000
 6:    4552.167    2001
 7:    4548.750    2001
 8:    4451.500    2001
 9:    4057.333    2001
10:    4232.167    2001
11:    4523.833    2002
12:    4517.000    2002
13:    4469.500    2002
14:    4379.833    2002
15:    4473.500    2002
16:    4243.333    2003
17:    4270.000    2003
18:    4611.333    2003
19:    4688.333    2003
20:    4720.183    2003
21:    4691.667    2004
22:    4554.167    2004
23:    4217.000    2004
24:    4224.500    2004
25:    4521.167    2004
26:    4549.000    2005
27:    4490.000    2005
28:    4492.167    2005
29:    4416.333    2005
30:    4189.833    2005
31:    4481.000    2000
32:    4583.167    2000
33:    4540.667    2000
34:    4567.333    2000
35:    4510.833    2000
36:    4274.333    2001
37:    4198.167    2001
38:    4392.000    2001
39:    4357.000    2001
40:    4419.667    2001
41:    4439.042    2002
42:    4398.667    2002
43:    4221.667    2002
44:    4172.750    2002
45:    4417.667    2002
46:    4479.510    2003
47:    4527.833    2003
48:    4454.843    2003
49:    4492.177    2003
50:    4225.833    2003

What I want to do is to get the top 2 values of total demand based on each financial year.

I want to look at all the values which are of FinYear = 2000, then find the top 2 then store that.

Then I want to look at all the values which are of FinYear = 2001, then find the top 2, then store that, etc, etc for all the FinYears.

I want a data table/frame/list which is the result.

Any suggestions?

We can order "FinYear", and "totaldemand" (descendingly), grouped by "FinYear" and subset the first two rows with head .

x[order(FinYear, -totaldemand), head(.SD, 2), by = FinYear]
#   FinYear totaldemand
# 1:    2000    4708.667
# 2:    2000    4583.167
# 3:    2001    4552.167
# 4:    2001    4548.750
# 5:    2002    4523.833
# 6:    2002    4517.000
# 7:    2003    4720.183
# 8:    2003    4688.333
# 9:    2004    4691.667
#10:    2004    4554.167
#11:    2005    4549.000
#12:    2005    4492.167

Instead of using .SD , we can also do

x[order(FinYear, -totaldemand), .(totaldemand = totaldemand[1:2]), by = FinYear]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM