简体   繁体   English

尝试让脚本为每24行计算一个值(使用函数)

[英]Trying to make a script calculate a value (using a function) for every 24 rows

I have not been able to find a solution to a problem similar to this on StackOverflow. 我无法在StackOverflow上找到类似问题的解决方案。 I hope someone can help! 我希望有人能帮帮忙!

I am using the R environment. 我正在使用R环境。

I have data from turtle nests. 我有海龟窝的数据。 There are two types of hourly data in each nest. 每个嵌套中有两种类型的每小时数据。 The first is hourly Temperature, and it has an associated hourly Development (amount of "anatomical" embryonic development"). 第一个是每小时的温度,它具有相关的每小时发育(“解剖学”的胚胎发育量)。

I am calculating a weighted median . 我正在计算加权中位数 In this case, the median is temperature and it is weighted by development. 在这种情况下,中位数是温度,并通过显影加权。

I have a script here that I am using to calculated weighted median: 我在这里使用一个脚本来计算加权中位数:

weighted.median <- function(x, w, probs=0.5, na.rm=TRUE) {
  x <- as.numeric(as.vector(x))
  w <- as.numeric(as.vector(w))
  if(anyNA(x) || anyNA(w)) {
    ok <- !(is.na(x) | is.na(w))
        x <- x[ok]
    w <- w[ok]
  }
  stopifnot(all(w >= 0))
  if(all(w == 0)) stop("All weights are zero", call.=FALSE)
  #'
  oo <- order(x)
  x <- x[oo]
  w <- w[oo]
  Fx <- cumsum(w)/sum(w)
  #'
  result <- numeric(length(probs))
  for(i in seq_along(result)) {
    p <- probs[i]
    lefties <- which(Fx <= p)
    if(length(lefties) == 0) {
      result[i] <- x[1]
    } else {
      left <- max(lefties)
      result[i] <- x[left]
      if(Fx[left] < p && left < length(x)) {
        right <- left+1
        y <- x[left] + (x[right]-x[left]) * (p-Fx[left])/(Fx[right]-        Fx[left])
        if(is.finite(y)) result[i] <- y
      }
    }
  }
  names(result) <- paste0(format(100 * probs, trim = TRUE), "%")
  return(result)
}

So from the function you can see that I need two input vectors, x and w (which will be temperature and development, respectively). 因此,从函数中可以看到,我需要两个输入向量x和w(分别是温度和显影)。

The problem I'm having is that I have hourly temperature traces that last anywhere from 5 days to 53 days (ie, 120 hours to 1272 hours). 我遇到的问题是我的每小时温度跟踪持续5天到53天(即120小时到1272小时)。

I would like to calculate the daily weighted median for all days within a nest (ie, take the 24 rows of x and w, and calculate the weighted median, then move onto rows 25-48, and so forth.) The output vector would therefore be a list of daily weighted medians with length n/24 (where n is the total number of rows in x). 我想计算一个巢中所有天的每日加权中位数(即,取x和w的24行,并计算加权中位数,然后移至25-48行,依此类推。)输出向量将是因此,应列出长度为n / 24的每日加权中位数(其中n是x中的总行数)。

In other words, I would like to analyse my data automatically, in a fashion equivalent to manually doing this (nest1 is the datasheet for Nest 1 which contains two vectors, temp and devo (devo is the weight))): 换句话说,我想以与手动执行此操作等效的方式自动分析我的数据(nest1是Nest 1的数据表,其中包含两个向量temp和devo(devo是权重))):

`weighted.median(nest1$temp[c(1,1:24)],nest1$devo[c(1,1:24)],na.rm=TRUE)`

followed by 其次是

weighted.median(nest1$temp[c(1,25:48)],nest1$devo[c(1,25:48)],na.rm=TRUE)

followed by 其次是

weighted.median(nest1$temp[c(1,49:72)],nest1$devo[c(1,49:72)],na.rm=TRUE)

all the way to 一直到

`weighted.median(nest1$temp[c(1,n-23:n)],nest1$devo[c(1,n-23:n)],na.rm=TRUE)`

I'm afraid I don't even know where to start. 恐怕我什至不知道从哪里开始。 Any help or clues would be very much appreciated. 任何帮助或线索将不胜感激。

The main idea is to create a new column for day 1, day 2, ..., day n/24, split the dataframe into subsets by day, and apply your function to each subset. 主要思想是为第1天,第2天,...,第n / 24天创建一个新列,将数据框按天拆分为子集,然后将函数应用于每个子集。

First I create some sample data: 首先,我创建一些示例数据:

set.seed(123)
n <- 120 # number of rows
nest1 <- data.frame(temp = rnorm(n), devo = rpois(n, 5))

Create the splitting variable: 创建拆分变量:

nest1$day <- rep(1:(nrow(nest1)/24), each = 24)

Then, use the by() function to split nest1 by nest1$day and apply the function to each subset: 然后,使用by()函数按nest1$day拆分nest1并将该函数应用于每个子集:

out <- by(nest1, nest1$day, function(d) {
  weighted.median(d$temp, d$devo, na.rm = TRUE)
})
data.frame(day = dimnames(out)[[1]], x = as.vector(out))
#   day           x
# 1   1 -0.45244433
# 2   2  0.15337312
# 3   3  0.07071673
# 4   4  0.23873174
# 5   5 -0.27694709

Instead of using by , you can also use the group_by + summarise functions from the dplyr package: 而不是使用by ,您也可以使用group_by + summarise从功能dplyr包:

library(dplyr)
nest1 %>%
  group_by(day) %>%
  summarise(x = weighted.median(temp, devo, na.rm = TRUE))
# # A tibble: 5 x 2
#     day       x
#   <int>   <dbl>
# 1     1 -0.452 
# 2     2  0.153 
# 3     3  0.0707
# 4     4  0.239 
# 5     5 -0.277 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM