简体   繁体   English

根据 R 中另一个 DataFrame 的条件从 DataFrame 中提取值

[英]Extract values from a DataFrame based on condition on another DataFrame in R

I have following two sample dataframes:我有以下两个示例数据框:

df1 <- data.frame(EVI_GT=c(0.23, 0.54, 0.36, 0.92), EVI_GNT=c(0.33, 0.65, 0.42, 0.73), EVI_GGT=c(0.43, 0.34, 0.22, 0.98))
df2 <- data.frame(T_ET_GT=c(0.56, 0.23, 0.95, 0.82), T_ET_GNT=c(0.10, 0.74, 0.36, 0.35), T_ET_GGT=c(0.52, 0.31, 0.65, 0.58))

I have to extract values from df2 corresponding to min and max of df1 (each row).我必须从 df2 中提取对应于 df1 的最小值和最大值(每行)的值。 For example, min (max) value of first row in df1 is 0.23 (0.43) ie, column 1 (column 3) so the values that should be extracted from df2 will be 0.56 and 0.52 for the first row.例如,df1 中第一行的最小(最大值)值为 0.23 (0.43),即第 1 列(第 3 列),因此应从 df2 中提取的值将是 0.56,第一行为 0.52。 Similar for row 2 and so on.第 2 行类似,依此类推。 Below is my desired output dataframe:下面是我想要的 output dataframe:

df3 <- data.frame(column1=c(0.56, 0.31, 0.65, 0.35), column2=c(0.52, 0.74, 0.36, 0.58))

How can we get df3 from df2 using conditions on df1?我们如何使用 df1 上的条件从 df2 获得 df3?

You can use which.min and which.max to get index of minimum and maximum value respectively.您可以使用which.minwhich.max分别获取最小值和最大值的索引。 Use apply to perform rowwise operation and subset the data from df2 .使用apply执行逐行操作并对df2中的数据进行子集化。

data.frame(column1 = df2[cbind(1:nrow(df1), apply(df1, 1, which.min))],
           column2 = df2[cbind(1:nrow(df1), apply(df1, 1, which.max))])

#  column1 column2
#1    0.56    0.52
#2    0.31    0.74
#3    0.65    0.36
#4    0.35    0.58

Assuming your dataframes have the same dimensions, that should be fairly easy!假设您的数据框具有相同的尺寸,那应该相当容易!

A very intuitive and simple way would be looping for the number of rows in df1 (or df2 ) and finding the column which elements are max and min for every row in df1, thus using that information to subset df2 and attribute that value to df3.一种非常直观和简单的方法是循环获取df1 (或df2 )中的行数,并找到 df1 中每一行的最大和最小元素的列,从而使用该信息对 df2 进行子集并将该值归因于 df3。

df3 <- data.frame(
  min = NA,
  max = NA
)

for (i in seq_len(nrow(df1))) {
  max_val <- which.max(df1[i, ])
  min_val <- which.min(df1[i, ])
  df3[i, 1] <- df2[i, min_val]
  df3[i, 2] <- df2[i, max_val]
}

A more "dynamic" way of doing that would be extracting the "which.max" and "which.min" from df1 row by row (through an apply statement), thus forming a list of indexes.一种更“动态”的方法是从 df1 中逐行提取“which.max”和“which.min”(通过应用语句),从而形成索引列表。 Then, one could define a matrix of row,col pairs (think of it as coordinates.) for the first and second conditions (min and max values).然后,可以为第一个和第二个条件(最小值和最大值)定义一个行、列对矩阵(将其视为坐标)。

indexes <- apply(df1, MARGIN = 1, function(x) {
  return(list(min_idx = which.min(x), max_idx = which.max(x)))
})

indexes <- dplyr::bind_rows(indexes)
indexes$row <- 1:nrow(indexes)
mins_indexes <- as.matrix(dplyr::select(indexes, c("row", "min_idx")))
maxes_indexes <- as.matrix(dplyr::select(indexes, c("row", "max_idx")))

df3 <- data.frame(
  min_vals = df2[mins_indexes],
  max_vals = df2[maxes_indexes]
)

This solution is loosely based on this problem Selecting specific elements from a matrix all at once !这个解决方案是基于这个问题从矩阵中一次性选择特定元素

NOTE: I've made the process as intuitive as possible, you could certainly use more clever names and maybe use less lines of code.注意:我已经使该过程尽可能直观,您当然可以使用更聪明的名称,并且可能使用更少的代码行。

An approach using purrr一种使用purrr的方法

library(dplyr)
library(purrr)

df1 %>%
  # list of row for df1
  pmap(~c(...)) %>%
  map2_dfr(.y = df2 %>% pmap(~c(...)), # map with list of row df2
    .f = function(a, b) { # function that take min/max each row of df1 and extract df2
      min_index <- which.min(a)
      max_index <- which.max(a)
      tibble(min = b[min_index], max = b[max_index])
    })

# Output
# A tibble: 4 x 2
    min   max
  <dbl> <dbl>
1  0.56 0.52 
2  0.31 0.74 
3  0.65 0.36 
4  0.35 0.580

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据来自另一个数据框的一系列值从一个数据框中提取值 - Extract values from a dataframe based on a range of values from another dataframe 根据另一列 R 的条件提取 dataframe 中的一列 - extract a column in dataframe based on condition for another column R R-根据来自另一个数据框的条件对一个数据框进行子集 - R - Subset a dataframe based on condition from another dataframe 根据另一个数据框的条件减去数据框的值 - Subtract values in dataframe based on condition from another dataframe 根据另一个数据帧R中的值填充数据帧中的缺失值 - Fill missing values in a dataframe based on values from another dataframe R 根据数据框r中的特定条件提取行 - Extract rows based on specific condition in dataframe r 使用 purrr 根据条件从嵌套的 dataframe 中提取值 - Using purrr to extract values from nested dataframe based on condition 如何从R中的另一个数据帧中提取一个数据帧 - How to extract a dataframe from another dataframe in R R - 根据另一个数据框列中的值满足的条件在数据框列中添加值(由公式导出) - R - Add values (derived by a formula) in a dataframe column based on a condition met by values in a column of another dataframe R:根据来自另一个 dataframe 的匹配条件添加行 - R: add rows based on matching condition from another dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM