[英]selecting a value from df during ddply summarise
I want to use ddply
and summarise
to get the monthly medians for several years of data.我想使用
ddply
和summarise
来获取几年数据的月中值。 I can do this successfully.我可以成功地做到这一点。 However, I would also like to have a column with the value for one year of the data.
但是,我还希望有一列包含一年数据的值。 I know other ways to add this, but would like to do it within the
ddply
line.我知道添加它的其他方法,但想在
ddply
行中进行。 Data is at the bottom.数据在底部。
The first row of the result would look like this if median for all years is 16 and the value for 2018 is 30:如果所有年份的中位数为 16 且 2018 年的值为 30,则结果的第一行将如下所示:
Month Median 2018
Apr 16.0 30
Here is what I have tried: This works as expected:这是我尝试过的:这按预期工作:
Summary<-ddply(df, ~Month, summarise, Median = median(Value))
Summary
When I try to add the single year value I can't seem to think of a way to do it:当我尝试添加单个年份值时,我似乎想不出办法:
Summary<-ddply(df, ~Month, summarise, Median = median(Value), SingleYearValue = which(df[,"Year"]==2018));Summary
df<-structure(list(Month = c("Apr", "Apr", "Apr", "Apr", "Apr", "Apr",
"Apr", "Apr", "Apr", "Apr", "Apr", "Apr", "Apr", "Apr", "Apr",
"Apr", "Apr", "Apr", "Apr", "Apr", "Apr", "Apr", "Apr", "Apr",
"Apr", "Apr", "Apr", "Apr", "Apr", "Apr", "Apr", "Aug", "Aug",
"Aug", "Aug", "Aug", "Aug", "Aug", "Aug", "Aug", "Aug", "Aug",
"Aug", "Aug", "Aug", "Aug", "Aug", "Aug", "Aug", "Aug", "Aug",
"Aug", "Aug", "Aug", "Aug", "Aug", "Aug", "Aug", "Aug", "Aug",
"Aug", "Aug", "Aug", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec",
"Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Dec", "Feb", "Feb",
"Feb", "Feb", "Feb", "Feb", "Feb", "Feb", "Feb", "Feb", "Feb",
"Feb", "Feb", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jan",
"Jan", "Jan", "Jan", "Jan", "Jan", "Jan", "Jul", "Jul", "Jul",
"Jul", "Jul", "Jul", "Jul", "Jul", "Jul", "Jul", "Jul", "Jul",
"Jul", "Jul", "Jul", "Jul", "Jul", "Jul", "Jul", "Jul", "Jul",
"Jul", "Jul", "Jul", "Jul", "Jul", "Jul", "Jul", "Jul", "Jul",
"Jul", "Jul", "Jun", "Jun", "Jun", "Jun", "Jun", "Jun", "Jun",
"Jun", "Jun", "Jun", "Jun", "Jun", "Jun", "Jun", "Jun", "Jun",
"Jun", "Jun", "Jun", "Jun", "Jun", "Jun", "Jun", "Jun", "Jun",
"Jun", "Jun", "Jun", "Jun", "Jun", "Jun", "Jun", "Mar", "Mar",
"Mar", "Mar", "Mar", "Mar", "Mar", "Mar", "Mar", "Mar", "Mar",
"Mar", "Mar", "Mar", "Mar", "Mar", "Mar", "May", "May", "May",
"May", "May", "May", "May", "May", "May", "May", "May", "May",
"May", "May", "May", "May", "May", "May", "May", "May", "May",
"May", "May", "May", "May", "May", "May", "May", "May", "May",
"May", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov",
"Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov", "Nov",
"Nov", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct",
"Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct",
"Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct", "Oct",
"Oct", "Oct", "Oct", "Oct", "Oct", "Sep", "Sep", "Sep", "Sep",
"Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep",
"Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep",
"Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep", "Sep",
"Sep"), Year = c("1960", "1961", "1962", "1963", "1964", "1965",
"1966", "1967", "1968", "1969", "1970", "1971", "1972", "2002",
"2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010",
"2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018",
"2019", "1960", "1961", "1962", "1963", "1964", "1965", "1966",
"1967", "1968", "1969", "1970", "1971", "1972", "2001", "2002",
"2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010",
"2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018",
"2019", "1959", "1960", "1961", "1962", "1963", "1964", "1965",
"1966", "1967", "1968", "1969", "1970", "1971", "1960", "1961",
"1962", "1963", "1964", "1965", "1966", "1967", "1968", "1969",
"1970", "1971", "1972", "1960", "1961", "1962", "1963", "1964",
"1965", "1966", "1967", "1968", "1969", "1970", "1971", "1972",
"1960", "1961", "1962", "1963", "1964", "1965", "1966", "1967",
"1968", "1969", "1970", "1971", "1972", "2001", "2002", "2003",
"2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011",
"2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019",
"1960", "1961", "1962", "1963", "1964", "1965", "1966", "1967",
"1968", "1969", "1970", "1971", "1972", "2001", "2002", "2003",
"2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011",
"2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019",
"1960", "1961", "1962", "1963", "1964", "1965", "1966", "1967",
"1968", "1969", "1970", "1971", "1972", "2016", "2017", "2018",
"2019", "1960", "1961", "1962", "1963", "1964", "1965", "1966",
"1967", "1968", "1969", "1970", "1971", "1972", "2002", "2003",
"2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011",
"2012", "2013", "2014", "2015", "2016", "2017", "2018", "2019",
"1959", "1960", "1961", "1962", "1963", "1964", "1965", "1966",
"1967", "1968", "1969", "1970", "1971", "2005", "2015", "2016",
"2017", "2018", "1959", "1960", "1961", "1962", "1963", "1964",
"1965", "1966", "1967", "1968", "1969", "1970", "1971", "2001",
"2002", "2003", "2004", "2005", "2006", "2007", "2008", "2010",
"2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018",
"2019", "1960", "1961", "1962", "1963", "1964", "1965", "1966",
"1967", "1968", "1969", "1970", "1971", "1972", "2001", "2002",
"2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010",
"2011", "2012", "2013", "2014", "2015", "2016", "2017", "2018",
"2019"), Value = 1:295), row.names = c(NA, -295L), class = "data.frame")
You can subset
a particular years value and then merge
:您可以对特定年份值进行
subset
,然后merge
:
year = 2018
data <- subset(df, Year == year, select = -Year)
names(data)[names(data) == 'Value'] <- year
merge(Summary, data, by = 'Month', all.x = TRUE)
# Month Median 2018
#1 Apr 16.0 30
#2 Aug 47.5 62
#3 Dec 70.0 NA
#4 Feb 83.0 NA
#5 Jan 96.0 NA
#6 Jul 118.5 133
#7 Jun 150.5 165
#8 Mar 175.0 182
#9 May 199.0 213
#10 Nov 223.5 232
#11 Oct 248.0 262
#12 Sep 279.5 294
If we want to do this all in plyr
, use the plyr::join
如果我们想在
plyr
中完成这一切,请使用plyr::join
plyr::join(Summary, subset(df, Year == 2018, select = -Year))
# Month Median Value
#1 Apr 16.0 30
#2 Aug 47.5 62
#3 Dec 70.0 NA
#4 Feb 83.0 NA
#5 Jan 96.0 NA
#6 Jul 118.5 133
#7 Jun 150.5 165
#8 Mar 175.0 182
#9 May 199.0 213
#10 Nov 223.5 232
#11 Oct 248.0 262
#12 Sep 279.5 294
Or if we want to do this within ddply
或者如果我们想在
ddply
中执行此操作
plyr::ddply(df, ~ Month, summarise, Median = median(Value),
`2018` = Value[Year == 2018][1])
# Month Median 2018
#1 Apr 16.0 30
#2 Aug 47.5 62
#3 Dec 70.0 NA
#4 Feb 83.0 NA
#5 Jan 96.0 NA
#6 Jul 118.5 133
#7 Jun 150.5 165
#8 Mar 175.0 182
#9 May 199.0 213
#10 Nov 223.5 232
#11 Oct 248.0 262
#12 Sep 279.5 294
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.