![](/img/trans.png)
[英]R loop to see if column value(s) from dataframe1 match column values from dataframe2
[英]How to subset from dataframe2 depending on the values in dataframe1 and stack all subsets in one dataframe in R?
我想創建一個函數,該函數采用數據幀df1的行(具有列x1,x2,x3),並且此函數的輸出是數據幀df2的子集(具有y1,y2列),該子集基於df1行中的值。 我想將此功能應用於df1的每一行,並將結果數據幀(df2的子集)堆疊在一個大數據幀中。 示例如何使用for循環來完成:df1示例:
x1 x2 x3
a 3.1 4.5
b 9.0 10.1
a 9.0 20.0
c 1.1 6.0
df2的示例:
y1 y2
a 4.0
a 10.0
a 3.5
b 9.8
b 9.5
b 25.0
c 8.2
c 12.0
執行此處理的for循環示例:
desired_df = df2[1, ]
for (i in 1:nrow(df1)) {
subset = filter(df2, y1 == df1[i, "x1"] & y2 > df1[i, "x2"] & y2 < df1[i, "x3"])
desired_df = rbind(desired_df, subset)
}
desired_df = desired_df[-1, ]
所需的數據幀是:
y1 y2
a 4.0
a 3.5
b 9.8
b 9.5
a 10.0
根據df1中的值,子集可以提供不同長度的數據幀(有時沒有元素),問題是:如何編寫這種子集和附加過程,以矢量形式進行,而無需for循環?
看起來我們需要一個fuzzy_join
library(dplyr)
library(fuzzyjoin)
fuzzy_inner_join(df1, df2, by = c('x1' = 'y1', 'x2' = 'y2', 'x3' = 'y2'),
match_fun = list(`==`, `<=`, `>`)) %>%
select(names(df2))
# y1 y2
#1 a 4.0
#2 a 3.5
#3 b 9.8
#4 b 9.5
#5 a 10.0
df1 <- structure(list(x1 = c("a", "b", "a", "c"), x2 = c(3.1, 9, 9,
1.1), x3 = c(4.5, 10.1, 20, 6)), class = "data.frame", row.names = c(NA,
-4L))
df2 <- structure(list(y1 = c("a", "a", "a", "b", "b", "b", "c", "c"),
y2 = c(4, 10, 3.5, 9.8, 9.5, 25, 8.2, 12)), class = "data.frame",
row.names = c(NA,
-8L))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.