[英]Median plot per month with #obs. on second axis per month in r-studio
I have a data.frame consisting of 2 variables with each 2.5 million obs. 我有一个由2个变量组成的data.frame,每个250万个obs。
str(values)
data.frame': 2529905 obs. of 2 variables:
$ Date : Factor w/ 498 levels "1977-11","1978-06",..: 108 60 12 108 58 108 132 188 51 60 ...
$ Value: num 223000 171528 110269 426000 172436 ...
> head(values)
Date Value
1 2003-01 223000.0
2 1999-01 171528.0
3 1992-01 110268.6
4 2003-01 426000.0
5 1998-11 172436.5
6 2003-01 334000.0
I wanted to make a data.frame with the median per date: 我想用每个日期的中位数制作一个data.frame:
library(plyr)
medianperdate = ddply(values, .(Date), summarize, median_value = median(Value))
> str(medianperdate)
'data.frame': 498 obs. of 2 variables:
$ Date : Factor w/ 498 levels "1977-11","1978-06",..: 1 2 3 4 5 6 7 8 9 10 ...
$ median_value: num 106638 84948 85084 75725 88487 ...
> head(medianperdate)
Date median_value
1 1977-11 106638.35
2 1978-06 84947.65
3 1985-07 85083.79
4 1986-05 75724.58
5 1986-11 88487.14
6 1986-12 98697.20
But what I want, is an extra column which counts the observations per month (eg. 2003-01, the data used would be object "values" 但是我想要的是一个额外的列,它每月统计观察值(例如2003-01,使用的数据将是对象“值”
And another extra column where I define which class house it is: 在另一列中,我定义了它的类别是:
a : < 200 000
b : < 300 000 & > 200 000
c : < 300 000 & > 2000000
I will continuetrying this but because I am already stuck for a couple of hours I will appreciate help very much!! 我将继续尝试此操作,但由于我已经停滞了几个小时,因此非常感谢您的帮助!!
If it is not clear, what I can understand. 如果不清楚,我能理解什么。 The following testdataframe presents how I would like my dataframe to look like
以下testdataframe展示了我希望数据框如何显示
> testdf
Year MedianValue HouseClass #Observations
1 1999-1 200000 B 501
2 1999-2 150000 A 664
3 1999-3 250000 C 555
Like my answer to your previous question 0 就像我对您先前的问题的回答0
library(data.table)
dt <- data.table(df)
dt2 <- dt[,list(
medianvalue = median(value),
obs = .N
),
by = "Date"
]
dt2[,HouseClass := "c"]
dt2[obs < 300000,HouseClass := "b"]
dt2[obs < 200000,HouseClass := "a"]
You can write functions in the apply and apply like functions (which includes the plyr functions). 您可以在apply中编写函数,并应用类似函数(包括plyr函数)。 It would look something like this:
它看起来像这样:
ddply(values, .(Date), .fun = function(x) {
median <- median(x)
value <- ifelse(median < 200000, 'A', ifelse(median < 300000, 'B', 'C'))
n <- length(x)
return(c(median, value, n))
})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.