简体   繁体   English

使用R中的大型数据集绘制过去10年的房价中位数

[英]Plot median houseprice over last 10 years with large dataset in R

I need to create a plot with the monthly median house price over time. 我需要创建一个随时间推移的每月中位数房价的图。 The data is at a random order and consists selling prices of individual houses. 数据是随机排列的,包含各个房屋的售价。

I already converted the daily dates to monthly and converted Value into a numeric column. 我已经将每日日期转换为每月,并将Value转换为数字列。 But I can't manage to calculate the median per month. 但是我无法计算每月的中位数。

below are the characteristics of the dataset.
  str(a)
'data.frame':   1411764 obs. of  2 variables:
 $ Date : Factor w/ 498 levels "1977-11","1978-06",..: 108 60 12 58 51 60 12 59 60 60 ...
$ Value: num  223000 171528 110269 172436 181512 ...
>head(a)    
    Date    Value
1  2003-01 223000.0
2  1999-01 171528.0
3  1992-01 110268.6
5  1998-11 172436.5
9  1998-04 181512.1
10 1999-01 197848.0

If you have a lot of data, you will find data.table very efficient for such operations. 如果您有大量数据,您会发现data.table对于此类操作非常有效。 If you don't, you will still find data.table very useful - 如果不这样做,您仍然会发现data.table非常有用-

library(data.table)
dt <- data.table(df)
dt[,list(medianvalue = median(Value)), by = "Date"]

I'd use plyr for this. 我会为此使用plyr Something like this should get you a data.frame with the median per month: 这样的事情应该为您提供一个data.frame其中包含每月的中位数:

library(plyr)
result_df = ddply(a, .(Date), summarize, median_value = median(Value))

plyr is known to be a little slow for larger datasets, but I would just give the code above a try. 众所周知,对于较大的数据集, plyr有点慢,但是我只是在上面尝试一下。 A very good alternative is data.table , which provides roughly the same functionality, but then orders of magnitude faster. 一个很好的替代方法是data.table ,它提供大致相同的功能,但速度要快data.table数量级。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM