简体   繁体   English

将值分配给 df$column,其值在 R 中的另一个 df 中计算

[英]Assign values to a df$column with values calculated in another df in R

Codes for the dfs are at the end. dfs 的代码在最后。

I have two dataframes.我有两个数据框。 The first df is meteo data from 3 different stations :第一个 df 是来自 3 个不同站点的气象数据:

  site     date        temp
   X    2021-01-01      14
   X    2021-01-02      NA
   X    2021-01-03      10
   X    2021-01-04      14
   X    2021-01-05      10
   X    2021-01-06      10
   X    2021-01-07      13
   X    2021-01-08      12
   X    2021-01-09      13
   X    2021-01-10       7
   X    2021-01-11       9
   X    2021-01-12       6
   X    2021-01-13       8
   Y    2021-01-01      10
   Y    2021-01-02      14
   Y    2021-01-03       5
   Y    2021-01-04       7
   Y    2021-01-05       7
   Y    2021-01-06       9
   Y    2021-01-07       6
   Y    2021-01-08      12
   Y    2021-01-09      10
   Y    2021-01-10       9
   Y    2021-01-11      13
   Y    2021-01-12      13
   Y    2021-01-13      NA
   Y    2021-01-14       8
   Y    2021-01-15      11
   Y    2021-01-16       5
   Y    2021-01-17      11
   Y    2021-01-18      13
   Y    2021-01-19      11
   Y    2021-01-20       9
   Y    2021-01-21       9
   Y    2021-01-22       5
   Y    2021-01-23       6
   Y    2021-01-24      14
   Y    2021-01-25      10
   Y    2021-01-26       7
   Z    2021-01-01       9
   Z    2021-01-02      NA
   Z    2021-01-03      12
   Z    2021-01-04       6
   Z    2021-01-05       5
   Z    2021-01-06       7
   Z    2021-01-07       7
   Z    2021-01-08       5
   Z    2021-01-09       7
   Z    2021-01-10       7
   Z    2021-01-11      15
   Z    2021-01-12       8
   Z    2021-01-13       5
   Z    2021-01-14       6
   Z    2021-01-15       5
   Z    2021-01-16      12
   Z    2021-01-17       8
   Z    2021-01-18       7
   Z    2021-01-19       6
   Z    2021-01-20      13
   Z    2021-01-21      14
   Z    2021-01-22       8
   Z    2021-01-23      11
   Z    2021-01-24       7

The second df consists of observations made on the same site than the meteo stations.第二个 df 包含在与气象站相同的站点上进行的观测。 There is a trap at each station.每个车站都有一个陷阱。 Every couple days, the trap is emptied and the different species that were trapped are counted separately.每隔几天,陷阱就会被清空,被困的不同物种会被单独计算。 For each site in df2 , the date of pose is always the day after the date of withdrawal of the precedent entree (row).对于df2中的每个站点, pose日期始终是先例主菜(行) withdrawal日期的第二天。 In this exemple, the species are in the obs column.在这个例子中,物种在obs列中。 They are named A , B , C , D , F and G .它们被命名为ABCDFG freq is the number of individuals that were trapped for that specie. freq是为该物种被困的个体数量。

  site    pose        withdrawal    obs    freq
   X    2021-01-01    2021-01-03      A      31
   X    2021-01-01    2021-01-03      B      42
   X    2021-01-04    2021-01-05      A      14
   X    2021-01-06    2021-01-13      D      16
   X    2021-01-06    2021-01-13      F      36
   Y    2021-01-01    2021-01-04      G      49
   Y    2021-01-01    2021-01-04      A      29
   Y    2021-01-01    2021-01-04      C      45
   Y    2021-01-05    2021-01-14      D      25
   Y    2021-01-05    2021-01-14      A      50
   Y    2021-01-15    2021-01-14      B      40
   Y    2021-01-19    2021-01-26      B      39
   Z    2021-01-01    2021-01-03      C      25
   Z    2021-01-04    2021-01-05      F       3
   Z    2021-01-04    2021-01-05      B      16
   Z    2021-01-06    2021-01-14      C      19
   Z    2021-01-15    2021-01-19      A      12
   Z    2021-01-15    2021-01-19      B      26
   Z    2021-01-15    2021-01-19      F       2
   Z    2021-01-20    2021-01-24      A      24

I want to add a mean_T column to df2 where I would store the mean temperature for each entree in df2 .我想在df2中添加一个mean_T列,在其中将每个主菜的平均温度存储在df2中。

For ID = 1 , the mean temperature would be calculated with the entrees 2021-01-01 , 2021-01-02 and 2021-01-03 in df1 , where site = 'X' .对于ID = 1 ,将使用df1中的主菜2021-01-012021-01-022021-01-03计算平均温度,其中site = 'X'

With simpler dfs, I used this code the get the mean temperature.使用更简单的 dfs,我使用此代码获取平均温度。 It works if I only have one entree per date, per site in df2 , which is not the case.如果我在df2中的每个站点每个日期只有一个主菜,它就可以工作,但事实并非如此。

df1 <- split(df1, with(df1, site), subset(df1, select = -site) )
df1 <- lapply(df1, function(x) x[(names(x) %in% c("ID", "date", "temp"))])

df2 <- split(df2, with(df2, site), subset(df2, select = -site) )
df2 <- lapply(df2, function(x) x[(names(x) %in% c("ID", "pose", "withdrawal"))])

 library(dplyr)
 library(tidyr)

Then, this code gave me the mean temperature.然后,这段代码给了我平均温度。 Credits go to @TarJae :学分去@TarJae

 mean_X <- df2$X %>% 
      pivot_longer(-ID, values_to = "date") %>% 
       full_join(df1$X, by= "date") %>% 
      arrange(date) %>% 
      fill(ID, .direction = "down") %>% 
       group_by(ID) %>% 
      summarise(mean_T = mean(temp, na.rm = TRUE)) %>% 
      left_join(df2$X, by="ID")

This chunk of code also worked credits go to @Jon Spring :这段代码也有效,归功于@Jon Spring

df2 %>%
    mutate(days = (withdrawal - pose + 1) %>% as.integer) %>%
    tidyr::uncount(days, .id = "row") %>%
    transmute(ID, date = pose + row - 1) %>%
    left_join(df1) %>%
    group_by(ID) %>%
    summarize(mean_T = mean(temp)) %>% 
    right_join(df2)

Here is the code to generate the dfs :这是生成 dfs 的代码:
df1 <- data.frame( site = c(rep('X', 13), rep('Y', 26), rep('Z', 24) ) ,
                     date = c( seq( as.Date("2021-01-01"), by="day", length.out=13 ),
                               seq( as.Date("2021-01-01"), by="day", length.out=26 ),
                               seq( as.Date("2021-01-01"), by="day", length.out=24 )) , 
                     temp = c(14, NA,   10, 14, 10, 10, 13, 12, 13, 7,  9,  6,  8,  10, 14, 5,  7,  7,  9,  6,  12,
                              10,   9,  13, 13, NA, 8,  11, 5,  11, 13, 11, 9,  9,  5,  6,  14, 10, 7,  9,  NA, 12, 
                               6,   5,  7,  7,  5,  7,  7,  15, 8,  5,  6, 5,   12, 8,  7,  6,  13, 14, 8,  11, 7) ) 

df2 <- data.frame( site = c( rep('X', 5), rep('Y', 7), rep('Z', 8) ) , 
                   pose = as.Date( c("2021-01-01", "2021-01-01", "2021-01-04", "2021-01-06", 
                                     "2021-01-06", "2021-01-01", "2021-01-01", "2021-01-01", 
                                     "2021-01-05", "2021-01-05", "2021-01-15", "2021-01-19" ,
                                     "2021-01-01", "2021-01-04", "2021-01-04", "2021-01-06",
                                     "2021-01-15", "2021-01-15", "2021-01-15", "2021-01-20") ) ,
                   withdrawal = as.Date( c( "2021-01-03", "2021-01-03", "2021-01-05", "2021-01-13", 
                                            "2021-01-13", "2021-01-04", "2021-01-04", "2021-01-04", 
                                            "2021-01-14", "2021-01-14", "2021-01-14", "2021-01-26" ,
                                            "2021-01-03", "2021-01-05", "2021-01-05", "2021-01-14",
                                            "2021-01-19", "2021-01-19", "2021-01-19", "2021-01-24" ) ) , 
                   obs = c( 'A', 'B', 'A', 'D', 'F', 'G', 'A', 'C', 'D', 'A', 'B', 'B' , 
                            'C', 'F', 'B', 'C', 'A', 'B', 'F', 'A') ,
                   freq = c(31, 42, 14, 16, 36, 49, 29, 45, 25, 50, 40, 39, 25, 3, 16, 19, 12, 26, 2, 24) ) 
df2 <- cbind(ID = 1:nrow(df2), df2)
English is not my first language.英语不是我的第一语言。 If something doesn't make sense, fell free to let me know in the comments.如果有什么不明白的地方,请随时在评论中告诉我。

First I expand df2 to make a dataset with one row per day首先,我扩展df2以制作一个每天一行的数据集

df3 <- do.call(rbind,by(df2, 
   list(df2$ID), 
   function(d) data.frame(d,dates=d$pose:d$withdrawal)))

Now I merge df1 into this new dataset.现在我将df1合并到这个新数据集中。 I first need to convert the date to a numeric to match df3我首先需要将日期转换为数字以匹配df3

df1$dates <- as.numeric(df1$date)
df4 <- merge(df1, df3,by=c("site", "dates"))

Now I can aggregate the new dataset by taking the mean temp over each day现在我可以通过每天的平均温度来聚合新数据集

aggregate(data=df4, temp ~ freq + site + obs + pose + withdrawal +ID, mean)      


   freq site obs       pose withdrawal ID      temp
1    31    X   A 2021-01-01 2021-01-03  1 12.000000
2    42    X   B 2021-01-01 2021-01-03  2 12.000000
3    14    X   A 2021-01-04 2021-01-05  3 12.000000
4    16    X   D 2021-01-06 2021-01-13  4  9.750000
5    36    X   F 2021-01-06 2021-01-13  5  9.750000
6    49    Y   G 2021-01-01 2021-01-04  6  9.000000
7    29    Y   A 2021-01-01 2021-01-04  7  9.000000
8    45    Y   C 2021-01-01 2021-01-04  8  9.000000
9    25    Y   D 2021-01-05 2021-01-14  9  9.666667
10   50    Y   A 2021-01-05 2021-01-14 10  9.666667
11   40    Y   B 2021-01-15 2021-01-14 11  9.500000
12   39    Y   B 2021-01-19 2021-01-26 12  8.875000
13   25    Z   C 2021-01-01 2021-01-03 13 10.500000
14    3    Z   F 2021-01-04 2021-01-05 14  5.500000
15   16    Z   B 2021-01-04 2021-01-05 15  5.500000
16   19    Z   C 2021-01-06 2021-01-14 16  7.444444
17   12    Z   A 2021-01-15 2021-01-19 17  7.600000
18   26    Z   B 2021-01-15 2021-01-19 18  7.600000
19    2    Z   F 2021-01-15 2021-01-19 19  7.600000
20   24    Z   A 2021-01-20 2021-01-24 20 10.600000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 R 中的条件从另一个 df 为 df$column 赋值 - How to assign values to a df$column from another df using conditions in R 根据R中另一列的值将df的列组合成一列? - combining columns of a df into a single column based on values of another column in R? 如何根据另一个df中的值使用向量中的值将值分配给df的列 - How to assign a value to a column of a df using values from a vector according to a value in another df 如果 df1 中的字符串值“X”等于 df2 中的任何字符串值,则将类别“1”分配给 R 中 df1 中新列中的值 X - If string value “X” in df1 is equal to any of the string values in df2, assign category “1” to value X in a new column in df1 in R 根据同一df中的另一列为df $列分配值 - Assigning values to a df$column based on another column in the same df R - 根据来自另一个 df 的条件,用重复的 ID 按组和列替换 1 df 中的值 - R - replacing values in 1 df by group and column with repeating IDs based on conditions from another df 从另一个df为df $列分配值? - Assign a value to a df$column from another df? 根据R中的另一个矩阵/ df替换df / matrix中的值 - Replace values in df/matrix based on another matrix/df in R 用 r 中的另一个 df 填充 NA 值 - Fill NA values with another df in r df基于R中的列值进入列表名称 - df into lists name based on column values in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM