[英]How to subset a dataframe with a conditional statement based on multiple column values
I'm trying to subset a dataframe on the basis of conditions from multiple columns.我正在尝试根据多列的条件对 dataframe 进行子集化。 Here is my dataframe.
这是我的 dataframe。
var1 <- c(x,x,x,y,y,z,z,z,z)
var2 <- c(a,b,c,a,b,a,b,c,d)
var3 <- c(2,4,1,4,1,6,2,5,8)
data1 <- data.frame(var1,var2,var3)
# -------------------------------------------------------------------------
# var1 var2 var3
# 1 x a 2
# 2 x b 4
# 3 x c 1
# 4 y a 4
# 5 y b 1
# 6 z a 6
# 7 z b 2
# 8 z c 5
# 9 z d 8
The output I expect is:我期望的 output 是:
# var1
# 1 y
# 2 z
The following are the conditions leading to the output:以下是导致 output 的条件:
- The output is a dataframe where only values of
var1
are selected.output 是 dataframe,其中仅选择了
var1
的值。- Values of
var3
wherevar2
is equal toa
is greater than values ofvar3
wherevar2
is equal tob
.var2
等于a
的var3
的值大于var2
等于b
的var3
的值。
I'm unable to create a code based on this complicated condition from multiple columns.我无法根据多列的这种复杂条件创建代码。
Thank you.谢谢你。
This can give you a factor:这可以给你一个因素:
subset(data1, (var2=="a"))[subset(data1, (var2=="a"))$var3 > subset(data1, (var2=="b"))$var3, "var1"]
# [1] y z
# Levels: x y z
You can use data.frame
to get what you want as follows:您可以使用
data.frame
来获取您想要的内容,如下所示:
data.frame(var1 = subset(data1, (var2=="a"))[subset(data1, (var2=="a"))$var3 > subset(data1, (var2=="b"))$var3, "var1"])
# var1
# 1 y
# 2 z
The most intuitive solution might be to use a for-loop.最直观的解决方案可能是使用 for 循环。 Probably, there are shorter and more elegant ways to solve this problem, but this should work:
可能有更短更优雅的方法来解决这个问题,但这应该有效:
selection <- c()
for(i in unique(var1)) {
var_store <- data1 %>%
filter(var1 == i, var2 == a | var2 == b)
if(filter(var_store, var2 == a) %>%
select(var3) %>%
as.numeric() >
filter(var_store, var2 == b) %>%
select(var3) %>%
as.numeric()) {
selection <- c(selection , unique(var_store$var1))
}
}
data1 %>%
filter(var1 %in% selection)
# # A tibble: 6 x 3
# var1 var2 var3
# <chr> <chr> <dbl>
# 1 y a 4
# 2 y b 1
# 3 z a 6
# 4 z b 2
# 5 z c 5
# 6 z d 8
I found that reshaping the dataframe can solve my problem.我发现重塑 dataframe 可以解决我的问题。 I have been transposed var2 using dcast() to get the desired result
我已使用 dcast() 转置 var2 以获得所需的结果
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.