将值分配给 df$column，其值在 R 中的另一个 df 中计算

Question

Codes for the dfs are at the end. dfs 的代码在最后。

I have two dataframes.我有两个数据框。 The first df is meteo data from 3 different stations :第一个 df 是来自 3 个不同站点的气象数据：

  site     date        temp
   X    2021-01-01      14
   X    2021-01-02      NA
   X    2021-01-03      10
   X    2021-01-04      14
   X    2021-01-05      10
   X    2021-01-06      10
   X    2021-01-07      13
   X    2021-01-08      12
   X    2021-01-09      13
   X    2021-01-10       7
   X    2021-01-11       9
   X    2021-01-12       6
   X    2021-01-13       8
   Y    2021-01-01      10
   Y    2021-01-02      14
   Y    2021-01-03       5
   Y    2021-01-04       7
   Y    2021-01-05       7
   Y    2021-01-06       9
   Y    2021-01-07       6
   Y    2021-01-08      12
   Y    2021-01-09      10
   Y    2021-01-10       9
   Y    2021-01-11      13
   Y    2021-01-12      13
   Y    2021-01-13      NA
   Y    2021-01-14       8
   Y    2021-01-15      11
   Y    2021-01-16       5
   Y    2021-01-17      11
   Y    2021-01-18      13
   Y    2021-01-19      11
   Y    2021-01-20       9
   Y    2021-01-21       9
   Y    2021-01-22       5
   Y    2021-01-23       6
   Y    2021-01-24      14
   Y    2021-01-25      10
   Y    2021-01-26       7
   Z    2021-01-01       9
   Z    2021-01-02      NA
   Z    2021-01-03      12
   Z    2021-01-04       6
   Z    2021-01-05       5
   Z    2021-01-06       7
   Z    2021-01-07       7
   Z    2021-01-08       5
   Z    2021-01-09       7
   Z    2021-01-10       7
   Z    2021-01-11      15
   Z    2021-01-12       8
   Z    2021-01-13       5
   Z    2021-01-14       6
   Z    2021-01-15       5
   Z    2021-01-16      12
   Z    2021-01-17       8
   Z    2021-01-18       7
   Z    2021-01-19       6
   Z    2021-01-20      13
   Z    2021-01-21      14
   Z    2021-01-22       8
   Z    2021-01-23      11
   Z    2021-01-24       7

The second df consists of observations made on the same site than the meteo stations.第二个 df 包含在与气象站相同的站点上进行的观测。 There is a trap at each station.每个车站都有一个陷阱。 Every couple days, the trap is emptied and the different species that were trapped are counted separately.每隔几天，陷阱就会被清空，被困的不同物种会被单独计算。 For each site in df2 , the date of pose is always the day after the date of withdrawal of the precedent entree (row).对于df2中的每个站点， pose日期始终是先例主菜（行） withdrawal日期的第二天。 In this exemple, the species are in the obs column.在这个例子中，物种在obs列中。 They are named A , B , C , D , F and G .它们被命名为A 、 B 、 C 、 D 、 F和G 。 freq is the number of individuals that were trapped for that specie. freq是为该物种被困的个体数量。

  site    pose        withdrawal    obs    freq
   X    2021-01-01    2021-01-03      A      31
   X    2021-01-01    2021-01-03      B      42
   X    2021-01-04    2021-01-05      A      14
   X    2021-01-06    2021-01-13      D      16
   X    2021-01-06    2021-01-13      F      36
   Y    2021-01-01    2021-01-04      G      49
   Y    2021-01-01    2021-01-04      A      29
   Y    2021-01-01    2021-01-04      C      45
   Y    2021-01-05    2021-01-14      D      25
   Y    2021-01-05    2021-01-14      A      50
   Y    2021-01-15    2021-01-14      B      40
   Y    2021-01-19    2021-01-26      B      39
   Z    2021-01-01    2021-01-03      C      25
   Z    2021-01-04    2021-01-05      F       3
   Z    2021-01-04    2021-01-05      B      16
   Z    2021-01-06    2021-01-14      C      19
   Z    2021-01-15    2021-01-19      A      12
   Z    2021-01-15    2021-01-19      B      26
   Z    2021-01-15    2021-01-19      F       2
   Z    2021-01-20    2021-01-24      A      24

I want to add a mean_T column to df2 where I would store the mean temperature for each entree in df2 .我想在df2中添加一个mean_T列，在其中将每个主菜的平均温度存储在df2中。

For ID = 1 , the mean temperature would be calculated with the entrees 2021-01-01 , 2021-01-02 and 2021-01-03 in df1 , where site = 'X' .对于ID = 1 ，将使用df1中的主菜2021-01-01 、 2021-01-02和2021-01-03计算平均温度，其中site = 'X' 。

With simpler dfs, I used this code the get the mean temperature.使用更简单的 dfs，我使用此代码获取平均温度。 It works if I only have one entree per date, per site in df2 , which is not the case.如果我在df2中的每个站点每个日期只有一个主菜，它就可以工作，但事实并非如此。

df1 <- split(df1, with(df1, site), subset(df1, select = -site) )
df1 <- lapply(df1, function(x) x[(names(x) %in% c("ID", "date", "temp"))])

df2 <- split(df2, with(df2, site), subset(df2, select = -site) )
df2 <- lapply(df2, function(x) x[(names(x) %in% c("ID", "pose", "withdrawal"))])

 library(dplyr)
 library(tidyr)

Then, this code gave me the mean temperature.然后，这段代码给了我平均温度。 Credits go to @TarJae :学分去@TarJae ：

 mean_X <- df2$X %>% 
      pivot_longer(-ID, values_to = "date") %>% 
       full_join(df1$X, by= "date") %>% 
      arrange(date) %>% 
      fill(ID, .direction = "down") %>% 
       group_by(ID) %>% 
      summarise(mean_T = mean(temp, na.rm = TRUE)) %>% 
      left_join(df2$X, by="ID")

This chunk of code also worked credits go to @Jon Spring :这段代码也有效，归功于@Jon Spring ：

df2 %>%
    mutate(days = (withdrawal - pose + 1) %>% as.integer) %>%
    tidyr::uncount(days, .id = "row") %>%
    transmute(ID, date = pose + row - 1) %>%
    left_join(df1) %>%
    group_by(ID) %>%
    summarize(mean_T = mean(temp)) %>% 
    right_join(df2)

Here is the code to generate the dfs :这是生成 dfs 的代码：

df1 <- data.frame( site = c(rep('X', 13), rep('Y', 26), rep('Z', 24) ) ,
                     date = c( seq( as.Date("2021-01-01"), by="day", length.out=13 ),
                               seq( as.Date("2021-01-01"), by="day", length.out=26 ),
                               seq( as.Date("2021-01-01"), by="day", length.out=24 )) , 
                     temp = c(14, NA,   10, 14, 10, 10, 13, 12, 13, 7,  9,  6,  8,  10, 14, 5,  7,  7,  9,  6,  12,
                              10,   9,  13, 13, NA, 8,  11, 5,  11, 13, 11, 9,  9,  5,  6,  14, 10, 7,  9,  NA, 12, 
                               6,   5,  7,  7,  5,  7,  7,  15, 8,  5,  6, 5,   12, 8,  7,  6,  13, 14, 8,  11, 7) ) 

df2 <- data.frame( site = c( rep('X', 5), rep('Y', 7), rep('Z', 8) ) , 
                   pose = as.Date( c("2021-01-01", "2021-01-01", "2021-01-04", "2021-01-06", 
                                     "2021-01-06", "2021-01-01", "2021-01-01", "2021-01-01", 
                                     "2021-01-05", "2021-01-05", "2021-01-15", "2021-01-19" ,
                                     "2021-01-01", "2021-01-04", "2021-01-04", "2021-01-06",
                                     "2021-01-15", "2021-01-15", "2021-01-15", "2021-01-20") ) ,
                   withdrawal = as.Date( c( "2021-01-03", "2021-01-03", "2021-01-05", "2021-01-13", 
                                            "2021-01-13", "2021-01-04", "2021-01-04", "2021-01-04", 
                                            "2021-01-14", "2021-01-14", "2021-01-14", "2021-01-26" ,
                                            "2021-01-03", "2021-01-05", "2021-01-05", "2021-01-14",
                                            "2021-01-19", "2021-01-19", "2021-01-19", "2021-01-24" ) ) , 
                   obs = c( 'A', 'B', 'A', 'D', 'F', 'G', 'A', 'C', 'D', 'A', 'B', 'B' , 
                            'C', 'F', 'B', 'C', 'A', 'B', 'F', 'A') ,
                   freq = c(31, 42, 14, 16, 36, 49, 29, 45, 25, 50, 40, 39, 25, 3, 16, 19, 12, 26, 2, 24) ) 
df2 <- cbind(ID = 1:nrow(df2), df2)

English is not my first language.英语不是我的第一语言。 If something doesn't make sense, fell free to let me know in the comments.如果有什么不明白的地方，请随时在评论中告诉我。

Answer 1

First I expand df2 to make a dataset with one row per day首先，我扩展df2以制作一个每天一行的数据集

df3 <- do.call(rbind,by(df2, 
   list(df2$ID), 
   function(d) data.frame(d,dates=d$pose:d$withdrawal)))

Now I merge df1 into this new dataset.现在我将df1合并到这个新数据集中。 I first need to convert the date to a numeric to match df3我首先需要将日期转换为数字以匹配df3

df1$dates <- as.numeric(df1$date)
df4 <- merge(df1, df3,by=c("site", "dates"))

Now I can aggregate the new dataset by taking the mean temp over each day现在我可以通过每天的平均温度来聚合新数据集

aggregate(data=df4, temp ~ freq + site + obs + pose + withdrawal +ID, mean)      


   freq site obs       pose withdrawal ID      temp
1    31    X   A 2021-01-01 2021-01-03  1 12.000000
2    42    X   B 2021-01-01 2021-01-03  2 12.000000
3    14    X   A 2021-01-04 2021-01-05  3 12.000000
4    16    X   D 2021-01-06 2021-01-13  4  9.750000
5    36    X   F 2021-01-06 2021-01-13  5  9.750000
6    49    Y   G 2021-01-01 2021-01-04  6  9.000000
7    29    Y   A 2021-01-01 2021-01-04  7  9.000000
8    45    Y   C 2021-01-01 2021-01-04  8  9.000000
9    25    Y   D 2021-01-05 2021-01-14  9  9.666667
10   50    Y   A 2021-01-05 2021-01-14 10  9.666667
11   40    Y   B 2021-01-15 2021-01-14 11  9.500000
12   39    Y   B 2021-01-19 2021-01-26 12  8.875000
13   25    Z   C 2021-01-01 2021-01-03 13 10.500000
14    3    Z   F 2021-01-04 2021-01-05 14  5.500000
15   16    Z   B 2021-01-04 2021-01-05 15  5.500000
16   19    Z   C 2021-01-06 2021-01-14 16  7.444444
17   12    Z   A 2021-01-15 2021-01-19 17  7.600000
18   26    Z   B 2021-01-15 2021-01-19 18  7.600000
19    2    Z   F 2021-01-15 2021-01-19 19  7.600000
20   24    Z   A 2021-01-20 2021-01-24 20 10.600000

将值分配给 df$column，其值在 R 中的另一个 df 中计算

问题描述

Here is the code to generate the dfs :这是生成 dfs 的代码：

English is not my first language.英语不是我的第一语言。 If something doesn't make sense, fell free to let me know in the comments.如果有什么不明白的地方，请随时在评论中告诉我。

1 个解决方案

解决方案1
1 已采纳 2022-06-13 18:41:39

将值分配给 df$column，其值在 R 中的另一个 df 中计算

问题描述

Here is the code to generate the dfs :这是生成 dfs 的代码：

English is not my first language.英语不是我的第一语言。 If something doesn't make sense, fell free to let me know in the comments.如果有什么不明白的地方，请随时在评论中告诉我。

1 个解决方案

解决方案1 1 已采纳 2022-06-13 18:41:39

解决方案1
1 已采纳 2022-06-13 18:41:39