简体   繁体   English

R - 对于列中的每个观察值,在另一列中找到最接近的观察值

[英]R - for each observation in a column, find the closest one in another column

I'm trying to filter my dataframe to keep only the rows that meet the following condition:我正在尝试过滤我的数据框以仅保留满足以下条件的行:

For each day AND each price_1, keep only the row where price_2 is the closest to price_1, and if two rows are at equal distance, take the mean of the 2 prices and volatilies.对于每一天和每个 price_1,只保留 price_2 与 price_1 最接近的行,如果两行等距,取 2 个价格和波动率的平均值。 For example :例如 :

 Date              price_2        price_1   Volat
 2011-07-15        215            200.0     5
 2011-07-15        217            200.0     6
 2011-07-15        235            200.0     5.5
 2011-07-15        240            200.0     5.3
 2011-07-15        200            201.5     6.2
 2011-07-16        203            205.0     6.4
 2011-07-16        207            205.0     5.1


Expected output:

 Date              price_2        price_1  Volat
 2011-07-15        215            200.0      5
 2011-07-15        200            201.5     6.2
 2011-07-16        205            205.0     5.75

I started like this, but I don't know how to continue :我是这样开始的,但我不知道如何继续:

group_by(Date)  %>% 
which(df,abs(df$price_1-df$price_2)==min(abs(df$price_1-df$price_2)))

Thanks a lot in advance!非常感谢!

Base R Solution:基础 R 解决方案:

price_summary <-
  data.frame(do.call("rbind", lapply(split(
    df, paste(df$Date, df$price_1, sep = " - ")
  ),
  function(x) {
    data.frame(
      Date = unique(x$Date),
      price_1 = unique(x$price_1),
      price_2 = mean(x$price_2[which.min(abs(x$price_2 - x$price_1))]),
      Volat = mean(x$Volat),
      stringsAsFactors = FALSE
    )
  })),
  row.names = NULL)

Data:数据:

df <- structure(
  list(
    Date = structure(c(
      15170, 15170, 15170, 15170,
      15170, 15171, 15171
    ), class = "Date"),
    price_2 = c(215L, 217L,
                235L, 240L, 200L, 203L, 207L),
    price_1 = c(200, 200, 200, 200,
                201.5, 205, 205),
    Volat = c(5, 6, 5.5, 5.3, 6.2, 6.4, 5.1)
  ),
  row.names = c(NA,-7L),
  class = "data.frame"
)

One dplyr option could be:一种dplyr选项可能是:

df %>%
 group_by(Date, price_1) %>%
 mutate(diff = abs(price_2 - price_1)) %>%
 filter(diff == min(diff)) %>%
 summarise_at(vars(price_2, Volat), mean)

  Date       price_1 price_2 Volat
  <chr>        <dbl>   <dbl> <dbl>
1 2011-07-15    200      215  5   
2 2011-07-15    202.     200  6.2 
3 2011-07-16    205      205  5.75

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 复制观察 r 中另一列中每一行的列 - replicate observation of column for each row in another columns in r 包括 columnheader 作为 R 中每个观察的另一个列值 - include columnheader as another column value for each observation in R 在数据帧的每一列中查找最接近零的值-R - Find value closest to zero in each column of a data frame - R 将重复观察的列变成R中每个观察的列 - Turn a column with repeating observations into a column for each observation in R R:按列组汇总数据-使用每个观察值对列进行变异 - R: Aggregating data by column group - mutate column with values for each observation 是否有更好的方法来找到满足R中数据帧另一列中每个值的条件的一列的百分比? - Is there a better way to find the percent of one column that meets a criteria for each value in another column for a data frame in R? R 中的行到列观察 - Observation in Row to Column in R 如何使用R在每次观察中都不会出现的另一列中基于字符串的字符串grep? - How to grep a group based on string in another column that doesn't occur in each observation using R? 如何识别同一列中最接近的观察值? - How to identify closest observation within the same column? 如何过滤R中一列与另一列日期最接近的日期? - How to filter the closest date of one column that is close to the date of another column in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM