简体   繁体   English

R:如何在第一列中查找值并在第三列中求和

[英]R: how to find value in first column and sum value of the third column

I have file like this 我有这样的文件

    Age.Range            Average  Probability
1    0 to 04               400     0.00400
2   05 to 09               221     0.00221
3   10 to 14               216     0.00216
4   15 to 19               409     0.00409

X [age of an individual; X [个人年龄; integer between 0 and 80 years] 0至80年之间的整数]

Y [the duration of monitoring of an individual; Y [监视个人的持续时间; integer between 1 and 50 years or “for life”] 1至50年之间的整数或“终身”]

I need to calculate probability that the person of age X (ex. 3) will develop cancer during the interval starting today until Y(ex. 7). 我需要计算X年龄(例3)的人在从今天开始直到Y(例7)的时间间隔内患上癌症的可能性。 In RI need to find value of X and value of X+Y in first column and sum all the values in the third column between those two ranges: 在RI中,需要在第一列中找到X值和X + Y值,并对这两个范围之间的第三列中的所有值求和:

X= 3
x+y=10
probability= 0.004 + 0.00221 + 0.00216

The following function does what you want. 以下功能可满足您的需求。 It gets the starts of the age ranges and then uses findInterval to find the indices into the probabilities column. 它获取年龄范围的起点,然后使用findInterval在概率列中找到索引。 Then it is a matter of adding those probabilities. 然后,要添加这些概率。

sumProbs <- function(DF, X, Y){
  DF[["Age.Range"]] <- as.character(DF[["Age.Range"]])
  Age.Start <- strsplit(DF[["Age.Range"]], " to ")
  Age.Start <- as.integer(sapply(Age.Start, '[[', 1))
  i <- findInterval(c(X, X + Y), Age.Start)
  p <- DF[["Probability"]][i[1]:i[2]]
  sum(p)
}

sumProbs(df1, 3, 7)
#[1] 0.00837

Data in dput format. dput格式的数据。

df1 <-
structure(list(Age.Range = c("0 to 04", "05 to 09", 
"10 to 14", "15 to 19"), Average = c(400L, 221L, 
216L, 409L), Probability = c(0.004, 0.00221, 0.00216, 
0.00409)), row.names = c("1", "2", "3", "4"), 
class = "data.frame")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM