[英]Assign values to a df$column with values calculated in another df in R
Codes for the dfs are at the end. dfs 的代码在最后。
I have two dataframes.我有两个数据框。 The first df is meteo data from 3 different stations :
第一个 df 是来自 3 个不同站点的气象数据:
site date temp
X 2021-01-01 14
X 2021-01-02 NA
X 2021-01-03 10
X 2021-01-04 14
X 2021-01-05 10
X 2021-01-06 10
X 2021-01-07 13
X 2021-01-08 12
X 2021-01-09 13
X 2021-01-10 7
X 2021-01-11 9
X 2021-01-12 6
X 2021-01-13 8
Y 2021-01-01 10
Y 2021-01-02 14
Y 2021-01-03 5
Y 2021-01-04 7
Y 2021-01-05 7
Y 2021-01-06 9
Y 2021-01-07 6
Y 2021-01-08 12
Y 2021-01-09 10
Y 2021-01-10 9
Y 2021-01-11 13
Y 2021-01-12 13
Y 2021-01-13 NA
Y 2021-01-14 8
Y 2021-01-15 11
Y 2021-01-16 5
Y 2021-01-17 11
Y 2021-01-18 13
Y 2021-01-19 11
Y 2021-01-20 9
Y 2021-01-21 9
Y 2021-01-22 5
Y 2021-01-23 6
Y 2021-01-24 14
Y 2021-01-25 10
Y 2021-01-26 7
Z 2021-01-01 9
Z 2021-01-02 NA
Z 2021-01-03 12
Z 2021-01-04 6
Z 2021-01-05 5
Z 2021-01-06 7
Z 2021-01-07 7
Z 2021-01-08 5
Z 2021-01-09 7
Z 2021-01-10 7
Z 2021-01-11 15
Z 2021-01-12 8
Z 2021-01-13 5
Z 2021-01-14 6
Z 2021-01-15 5
Z 2021-01-16 12
Z 2021-01-17 8
Z 2021-01-18 7
Z 2021-01-19 6
Z 2021-01-20 13
Z 2021-01-21 14
Z 2021-01-22 8
Z 2021-01-23 11
Z 2021-01-24 7
The second df consists of observations made on the same site than the meteo stations.第二个 df 包含在与气象站相同的站点上进行的观测。 There is a trap at each station.
每个车站都有一个陷阱。 Every couple days, the trap is emptied and the different species that were trapped are counted separately.
每隔几天,陷阱就会被清空,被困的不同物种会被单独计算。 For each site in
df2
, the date of pose
is always the day after the date of withdrawal
of the precedent entree (row).对于
df2
中的每个站点, pose
日期始终是先例主菜(行) withdrawal
日期的第二天。 In this exemple, the species are in the obs
column.在这个例子中,物种在
obs
列中。 They are named A
, B
, C
, D
, F
and G
.它们被命名为
A
、 B
、 C
、 D
、 F
和G
。 freq
is the number of individuals that were trapped for that specie. freq
是为该物种被困的个体数量。
site pose withdrawal obs freq
X 2021-01-01 2021-01-03 A 31
X 2021-01-01 2021-01-03 B 42
X 2021-01-04 2021-01-05 A 14
X 2021-01-06 2021-01-13 D 16
X 2021-01-06 2021-01-13 F 36
Y 2021-01-01 2021-01-04 G 49
Y 2021-01-01 2021-01-04 A 29
Y 2021-01-01 2021-01-04 C 45
Y 2021-01-05 2021-01-14 D 25
Y 2021-01-05 2021-01-14 A 50
Y 2021-01-15 2021-01-14 B 40
Y 2021-01-19 2021-01-26 B 39
Z 2021-01-01 2021-01-03 C 25
Z 2021-01-04 2021-01-05 F 3
Z 2021-01-04 2021-01-05 B 16
Z 2021-01-06 2021-01-14 C 19
Z 2021-01-15 2021-01-19 A 12
Z 2021-01-15 2021-01-19 B 26
Z 2021-01-15 2021-01-19 F 2
Z 2021-01-20 2021-01-24 A 24
I want to add a mean_T
column to df2
where I would store the mean temperature for each entree in df2
.我想在
df2
中添加一个mean_T
列,在其中将每个主菜的平均温度存储在df2
中。
For ID = 1
, the mean temperature would be calculated with the entrees 2021-01-01
, 2021-01-02
and 2021-01-03
in df1
, where site = 'X'
.对于
ID = 1
,将使用df1
中的主菜2021-01-01
、 2021-01-02
和2021-01-03
计算平均温度,其中site = 'X'
。
With simpler dfs, I used this code the get the mean temperature.使用更简单的 dfs,我使用此代码获取平均温度。 It works if I only have one entree per date, per site in
df2
, which is not the case.如果我在
df2
中的每个站点每个日期只有一个主菜,它就可以工作,但事实并非如此。
df1 <- split(df1, with(df1, site), subset(df1, select = -site) )
df1 <- lapply(df1, function(x) x[(names(x) %in% c("ID", "date", "temp"))])
df2 <- split(df2, with(df2, site), subset(df2, select = -site) )
df2 <- lapply(df2, function(x) x[(names(x) %in% c("ID", "pose", "withdrawal"))])
library(dplyr)
library(tidyr)
Then, this code gave me the mean temperature.然后,这段代码给了我平均温度。 Credits go to @TarJae :
学分去@TarJae :
mean_X <- df2$X %>%
pivot_longer(-ID, values_to = "date") %>%
full_join(df1$X, by= "date") %>%
arrange(date) %>%
fill(ID, .direction = "down") %>%
group_by(ID) %>%
summarise(mean_T = mean(temp, na.rm = TRUE)) %>%
left_join(df2$X, by="ID")
This chunk of code also worked credits go to @Jon Spring :这段代码也有效,归功于@Jon Spring :
df2 %>%
mutate(days = (withdrawal - pose + 1) %>% as.integer) %>%
tidyr::uncount(days, .id = "row") %>%
transmute(ID, date = pose + row - 1) %>%
left_join(df1) %>%
group_by(ID) %>%
summarize(mean_T = mean(temp)) %>%
right_join(df2)
df1 <- data.frame( site = c(rep('X', 13), rep('Y', 26), rep('Z', 24) ) ,
date = c( seq( as.Date("2021-01-01"), by="day", length.out=13 ),
seq( as.Date("2021-01-01"), by="day", length.out=26 ),
seq( as.Date("2021-01-01"), by="day", length.out=24 )) ,
temp = c(14, NA, 10, 14, 10, 10, 13, 12, 13, 7, 9, 6, 8, 10, 14, 5, 7, 7, 9, 6, 12,
10, 9, 13, 13, NA, 8, 11, 5, 11, 13, 11, 9, 9, 5, 6, 14, 10, 7, 9, NA, 12,
6, 5, 7, 7, 5, 7, 7, 15, 8, 5, 6, 5, 12, 8, 7, 6, 13, 14, 8, 11, 7) )
df2 <- data.frame( site = c( rep('X', 5), rep('Y', 7), rep('Z', 8) ) ,
pose = as.Date( c("2021-01-01", "2021-01-01", "2021-01-04", "2021-01-06",
"2021-01-06", "2021-01-01", "2021-01-01", "2021-01-01",
"2021-01-05", "2021-01-05", "2021-01-15", "2021-01-19" ,
"2021-01-01", "2021-01-04", "2021-01-04", "2021-01-06",
"2021-01-15", "2021-01-15", "2021-01-15", "2021-01-20") ) ,
withdrawal = as.Date( c( "2021-01-03", "2021-01-03", "2021-01-05", "2021-01-13",
"2021-01-13", "2021-01-04", "2021-01-04", "2021-01-04",
"2021-01-14", "2021-01-14", "2021-01-14", "2021-01-26" ,
"2021-01-03", "2021-01-05", "2021-01-05", "2021-01-14",
"2021-01-19", "2021-01-19", "2021-01-19", "2021-01-24" ) ) ,
obs = c( 'A', 'B', 'A', 'D', 'F', 'G', 'A', 'C', 'D', 'A', 'B', 'B' ,
'C', 'F', 'B', 'C', 'A', 'B', 'F', 'A') ,
freq = c(31, 42, 14, 16, 36, 49, 29, 45, 25, 50, 40, 39, 25, 3, 16, 19, 12, 26, 2, 24) )
df2 <- cbind(ID = 1:nrow(df2), df2)
First I expand df2
to make a dataset with one row per day首先,我扩展
df2
以制作一个每天一行的数据集
df3 <- do.call(rbind,by(df2,
list(df2$ID),
function(d) data.frame(d,dates=d$pose:d$withdrawal)))
Now I merge df1
into this new dataset.现在我将
df1
合并到这个新数据集中。 I first need to convert the date to a numeric to match df3
我首先需要将日期转换为数字以匹配
df3
df1$dates <- as.numeric(df1$date)
df4 <- merge(df1, df3,by=c("site", "dates"))
Now I can aggregate the new dataset by taking the mean temp over each day现在我可以通过每天的平均温度来聚合新数据集
aggregate(data=df4, temp ~ freq + site + obs + pose + withdrawal +ID, mean)
freq site obs pose withdrawal ID temp
1 31 X A 2021-01-01 2021-01-03 1 12.000000
2 42 X B 2021-01-01 2021-01-03 2 12.000000
3 14 X A 2021-01-04 2021-01-05 3 12.000000
4 16 X D 2021-01-06 2021-01-13 4 9.750000
5 36 X F 2021-01-06 2021-01-13 5 9.750000
6 49 Y G 2021-01-01 2021-01-04 6 9.000000
7 29 Y A 2021-01-01 2021-01-04 7 9.000000
8 45 Y C 2021-01-01 2021-01-04 8 9.000000
9 25 Y D 2021-01-05 2021-01-14 9 9.666667
10 50 Y A 2021-01-05 2021-01-14 10 9.666667
11 40 Y B 2021-01-15 2021-01-14 11 9.500000
12 39 Y B 2021-01-19 2021-01-26 12 8.875000
13 25 Z C 2021-01-01 2021-01-03 13 10.500000
14 3 Z F 2021-01-04 2021-01-05 14 5.500000
15 16 Z B 2021-01-04 2021-01-05 15 5.500000
16 19 Z C 2021-01-06 2021-01-14 16 7.444444
17 12 Z A 2021-01-15 2021-01-19 17 7.600000
18 26 Z B 2021-01-15 2021-01-19 18 7.600000
19 2 Z F 2021-01-15 2021-01-19 19 7.600000
20 24 Z A 2021-01-20 2021-01-24 20 10.600000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.