[英]Extract values from a DataFrame based on condition on another DataFrame in R
I have following two sample dataframes:我有以下两个示例数据框:
df1 <- data.frame(EVI_GT=c(0.23, 0.54, 0.36, 0.92), EVI_GNT=c(0.33, 0.65, 0.42, 0.73), EVI_GGT=c(0.43, 0.34, 0.22, 0.98))
df2 <- data.frame(T_ET_GT=c(0.56, 0.23, 0.95, 0.82), T_ET_GNT=c(0.10, 0.74, 0.36, 0.35), T_ET_GGT=c(0.52, 0.31, 0.65, 0.58))
I have to extract values from df2 corresponding to min and max of df1 (each row).我必须从 df2 中提取对应于 df1 的最小值和最大值(每行)的值。 For example, min (max) value of first row in df1 is 0.23 (0.43) ie, column 1 (column 3) so the values that should be extracted from df2 will be 0.56 and 0.52 for the first row.
例如,df1 中第一行的最小(最大值)值为 0.23 (0.43),即第 1 列(第 3 列),因此应从 df2 中提取的值将是 0.56,第一行为 0.52。 Similar for row 2 and so on.
第 2 行类似,依此类推。 Below is my desired output dataframe:
下面是我想要的 output dataframe:
df3 <- data.frame(column1=c(0.56, 0.31, 0.65, 0.35), column2=c(0.52, 0.74, 0.36, 0.58))
How can we get df3 from df2 using conditions on df1?我们如何使用 df1 上的条件从 df2 获得 df3?
You can use which.min
and which.max
to get index of minimum and maximum value respectively.您可以使用
which.min
和which.max
分别获取最小值和最大值的索引。 Use apply
to perform rowwise operation and subset the data from df2
.使用
apply
执行逐行操作并对df2
中的数据进行子集化。
data.frame(column1 = df2[cbind(1:nrow(df1), apply(df1, 1, which.min))],
column2 = df2[cbind(1:nrow(df1), apply(df1, 1, which.max))])
# column1 column2
#1 0.56 0.52
#2 0.31 0.74
#3 0.65 0.36
#4 0.35 0.58
Assuming your dataframes have the same dimensions, that should be fairly easy!假设您的数据框具有相同的尺寸,那应该相当容易!
A very intuitive and simple way would be looping for the number of rows in df1
(or df2
) and finding the column which elements are max and min for every row in df1, thus using that information to subset df2 and attribute that value to df3.一种非常直观和简单的方法是循环获取
df1
(或df2
)中的行数,并找到 df1 中每一行的最大和最小元素的列,从而使用该信息对 df2 进行子集并将该值归因于 df3。
df3 <- data.frame(
min = NA,
max = NA
)
for (i in seq_len(nrow(df1))) {
max_val <- which.max(df1[i, ])
min_val <- which.min(df1[i, ])
df3[i, 1] <- df2[i, min_val]
df3[i, 2] <- df2[i, max_val]
}
A more "dynamic" way of doing that would be extracting the "which.max" and "which.min" from df1 row by row (through an apply statement), thus forming a list of indexes.一种更“动态”的方法是从 df1 中逐行提取“which.max”和“which.min”(通过应用语句),从而形成索引列表。 Then, one could define a matrix of row,col pairs (think of it as coordinates.) for the first and second conditions (min and max values).
然后,可以为第一个和第二个条件(最小值和最大值)定义一个行、列对矩阵(将其视为坐标)。
indexes <- apply(df1, MARGIN = 1, function(x) {
return(list(min_idx = which.min(x), max_idx = which.max(x)))
})
indexes <- dplyr::bind_rows(indexes)
indexes$row <- 1:nrow(indexes)
mins_indexes <- as.matrix(dplyr::select(indexes, c("row", "min_idx")))
maxes_indexes <- as.matrix(dplyr::select(indexes, c("row", "max_idx")))
df3 <- data.frame(
min_vals = df2[mins_indexes],
max_vals = df2[maxes_indexes]
)
This solution is loosely based on this problem Selecting specific elements from a matrix all at once !这个解决方案是基于这个问题从矩阵中一次性选择特定元素!
NOTE: I've made the process as intuitive as possible, you could certainly use more clever names and maybe use less lines of code.注意:我已经使该过程尽可能直观,您当然可以使用更聪明的名称,并且可能使用更少的代码行。
An approach using purrr
一种使用
purrr
的方法
library(dplyr)
library(purrr)
df1 %>%
# list of row for df1
pmap(~c(...)) %>%
map2_dfr(.y = df2 %>% pmap(~c(...)), # map with list of row df2
.f = function(a, b) { # function that take min/max each row of df1 and extract df2
min_index <- which.min(a)
max_index <- which.max(a)
tibble(min = b[min_index], max = b[max_index])
})
# Output
# A tibble: 4 x 2
min max
<dbl> <dbl>
1 0.56 0.52
2 0.31 0.74
3 0.65 0.36
4 0.35 0.580
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.