[英]Finding the range and averaging the corresponding elements in R
我在一個數據集中有不同范圍的數字(或坐標),我想找到合適的數字范圍,然后取相應分數的平均值。
可以說我的數據集是:
coordinate score
1000 1.1
1001 1.2
1002 1.1
1003 1.4
1006 1.8
1007 1.9
1010 0.5
1011 1.1
1012 1.0
我應該找到合適的邊界( coordinate
不連續時),然后為每個特定范圍計算平均值。
我想要的結果:
start end mean-score
1000 1003 1.2
1006 1007 1.85
1010 1012 0.86
試試這個(假設df
是您的數據集)
library(data.table)
setDT(df)[, indx := .GRP, by = list(cumsum(c(1, diff(coordinate)) - 1))]
df[, list(start = coordinate[1],
end = coordinate[.N],
mean_score = round(mean(score), 2)), by = indx]
# indx start end mean_score
# 1: 1 1000 1003 1.20
# 2: 2 1006 1007 1.85
# 3: 3 1010 1012 0.87
或使用dplyr
library(dplyr)
df %>%
mutate(indx = dense_rank(cumsum(c(1, diff(coordinate)) - 1))) %>%
group_by(indx) %>%
summarise(start = first(coordinate),
end = last(coordinate),
mean_score = round(mean(score), 2))
# Source: local data frame [3 x 4]
#
# indx start end mean_score
# 1 1 1000 1003 1.20
# 2 2 1006 1007 1.85
# 3 3 1010 1012 0.87
這是一些替代的基礎R解決方案(效率低得多)
df$indx <- as.numeric(factor(cumsum(c(1, diff(df$coordinate)) - 1)))
cbind(aggregate(coordinate ~ indx, df, function(x) c(start = head(x, 1), end = tail(x, 1))),
aggregate(score ~ indx, df, function(x) mean_score = round(mean(x), 2)))
# indx coordinate.start coordinate.end indx score
# 1 1 1000 1003 1 1.20
# 2 2 1006 1007 2 1.85
# 3 3 1010 1012 3 0.87
要么
cbind(do.call(rbind, (with(df, tapply(coordinate, indx, function(x) c(start = head(x, 1), end = tail(x, 1)))))),
with(df, tapply(score, indx, function(x) mean_score = round(mean(x), 2))))
# start end
# 1 1000 1003 1.20
# 2 1006 1007 1.85
# 3 1010 1012 0.87
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.