[英]Adding the lowest value from a dataframe based on the matching ID of another dataframe in r
I have two dataframes (Dataframe #1 & #2).我有两个数据框(数据框 #1 和 #2)。 The values in columns A & B in the Dataframe #2, are from column ID_1 in the Dataframe #1 and indicates contiguous (ie neighboring) areas of each ID_2 row.
Dataframe #2 中的 A 和 B 列中的值来自 Dataframe #1 中的列 ID_1,并表示每个 ID_2 行的连续(即相邻)区域。 That is, the first row of the Dataframe #2 (ie ID_2=7) is neighboring with ID_1= 1 & 2.
也就是说,Dataframe #2(即 ID_2=7)的第一行与 ID_1=1 和 2 相邻。
What I want to do is the following: If a value of ID_1 can be found in either column A or B in the Dataframe #2, I would like to find the lowest First_year value and add it to the Dataframe #1 as a new column.我想要做的是以下内容:如果可以在 Dataframe #2 的A 或 B 列中找到 ID_1 的值,我想找到最低的 First_year 值并将其添加到 Dataframe #1 作为新列. Please refer to the Dataframe #3 for the table I would like to create.
请参阅 Dataframe #3 获取我要创建的表。 For instance, ID_1=1 is shown in row #1 &4 in the Dataframe #2 and the oldest First_year is 1990.
例如,ID_1=1 显示在 Dataframe #2 的第 1 行和第 4 行中,最早的 First_year 是 1990。
I would really appreciate it if anyone could help me.如果有人可以帮助我,我将不胜感激。 Have a great night.
有一个美好的夜晚。
Dataframe #1
ID_1 col1
1 10
2 15
3 20
4 10
5 20
6 15
Dataframe #2
ID_2 A B First_year
7 1 2 1990
8 3 4 1991
9 2 3 1995
10 1 3 1992
11 4 5 1990
12 3 4 1999
Dataframe #3
ID_1 oldest_First_year
1 1990
2 1990
3 1991
4 1990
5 1990
6 NA
Perform a join after getting dataframe 2 in long format and get the minimum value of First_year
for each ID_1
.得到长格式的 dataframe 2 后执行连接,得到每个
ID_1
的First_year
的最小值。
library(dplyr)
library(tidyr)
df1 %>%
left_join(df2 %>%
pivot_longer(cols = c(A, B)), by = c('ID_1' = 'value')) %>%
group_by(ID_1) %>%
summarise(oldest_first_year = min(First_year))
# ID_1 oldest_first_year
# <int> <int>
#1 1 1990
#2 2 1990
#3 3 1991
#4 4 1990
#5 5 1990
#6 6 NA
data数据
df1 <- structure(list(ID_1 = 1:6, col1 = c(10L, 15L, 20L, 10L, 20L,
15L)), class = "data.frame", row.names = c(NA, -6L))
df2 <- structure(list(ID_2 = 7:12, A = c(1L, 3L, 2L, 1L, 4L, 3L), B = c(2L,
4L, 3L, 3L, 5L, 4L), First_year = c(1990L, 1991L, 1995L, 1992L,
1990L, 1999L)), class = "data.frame", row.names = c(NA, -6L))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.