找到范圍並平均R中的相應元素

Question

我在一個數據集中有不同范圍的數字（或坐標），我想找到合適的數字范圍，然后取相應分數的平均值。

可以說我的數據集是：

coordinate score    
     1000   1.1
     1001   1.2
     1002   1.1
     1003   1.4
     1006   1.8
     1007   1.9
     1010   0.5
     1011   1.1
     1012   1.0

我應該找到合適的邊界（ coordinate不連續時），然后為每個特定范圍計算平均值。

我想要的結果：

start end mean-score
1000 1003  1.2
1006 1007  1.85
1010 1012  0.86

Answer 1

試試這個（假設df是您的數據集）

library(data.table)
setDT(df)[, indx := .GRP, by = list(cumsum(c(1, diff(coordinate)) - 1))]
df[, list(start = coordinate[1],
          end = coordinate[.N],
          mean_score = round(mean(score), 2)), by = indx]

#    indx start  end mean_score
# 1:    1  1000 1003       1.20
# 2:    2  1006 1007       1.85
# 3:    3  1010 1012       0.87

或使用dplyr

library(dplyr)
df %>%
  mutate(indx = dense_rank(cumsum(c(1, diff(coordinate)) - 1))) %>%
  group_by(indx) %>%
  summarise(start = first(coordinate),
            end = last(coordinate),
            mean_score = round(mean(score), 2))

# Source: local data frame [3 x 4]
# 
#   indx start  end mean_score
# 1    1  1000 1003       1.20
# 2    2  1006 1007       1.85
# 3    3  1010 1012       0.87

這是一些替代的基礎R解決方案（效率低得多）

df$indx <- as.numeric(factor(cumsum(c(1, diff(df$coordinate)) - 1)))
cbind(aggregate(coordinate ~ indx, df, function(x) c(start = head(x, 1), end = tail(x, 1))),
      aggregate(score ~ indx, df, function(x) mean_score = round(mean(x), 2)))

#   indx coordinate.start coordinate.end indx score
# 1    1             1000           1003    1  1.20
# 2    2             1006           1007    2  1.85
# 3    3             1010           1012    3  0.87

要么

cbind(do.call(rbind, (with(df, tapply(coordinate, indx, function(x) c(start = head(x, 1), end = tail(x, 1)))))),
with(df, tapply(score, indx, function(x) mean_score = round(mean(x), 2))))

#   start  end     
# 1  1000 1003 1.20
# 2  1006 1007 1.85
# 3  1010 1012 0.87

找到范圍並平均R中的相應元素

問題描述

1 個解決方案

解決方案1
3 已采納 2014-10-24 09:57:15

找到范圍並平均R中的相應元素

問題描述

1 個解決方案

解決方案1 3 已采納 2014-10-24 09:57:15

解決方案1
3 已采納 2014-10-24 09:57:15