简体   繁体   中英

R: replace 2 observations with their average

Here I have a data frame of 5028 observations with 6 variables, among which I am only interested in the dates, and prices of units.

[1] "datesold"     "postcode"     "price"        "propertyType" "bedrooms"    
[6] "rate" 
> head(data_u,10)

                 datesold postcode  price propertyType bedrooms        rate
24553 2007-06-27 00:00:00     2606 300000         unit        2  0.00000000
24554 2007-07-05 00:00:00     2611 300000         unit        2  0.60000000
24555 2007-07-19 00:00:00     2607 480000         unit        3 -0.25000000
24556 2007-07-20 00:00:00     2604 360000         unit        2  0.06944444
24557 2007-08-07 00:00:00     2617 385000         unit        3  0.05194805
24558 2007-08-09 00:00:00     2913 405000         unit        3  0.30617284
24559 2007-09-05 00:00:00     2612 529000         unit        2 -0.24385633
24560 2007-09-07 00:00:00     2602 400000         unit        2  0.22500000
24561 2007-09-20 00:00:00     2612 490000         unit        3 -0.55102041
24562 2007-09-24 00:00:00     2611 220000         unit        2  0.54545455

However There are houses sold on the same dates at different prices. So I would like to locate observations on the same dates, compute the average prices on each date, and replace the observations with this single one.

I have thought about the double for-loop with ifelse() inside. But I am having troubles implementing the idea. Any help is appreciated!

Try:

aggregate(price ~ datesold, data = data_u, FUN = mean, na.rm = TRUE)

This will calculate the average price for each unique value of datesold and display the results in a dataframe where each row correspondes to a value of datesold . I set the optional argument na.rm to TRUE because if price has missing values then the mean price for the datesold subgroups with at least one NA would also return NA . With na.rm = TRUE only observations with known price are considered, ensuring that you get a mean price for each datesold .

An easy way to do it would be:

library(tidyverse)
Temp <- data_u %>% group_by(datesold) %>% summarise(Mean_Price = mean(price))
data_u <- data_u %>% left_join(Temp)

Do you need this?

library(dplyr)

data_u %>%
  group_by(date = as.Date(lubridate::ymd_hms(datesold))) %>%
  mutate(mean_price = mean(price, na.rm = TRUE))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM