简体   繁体   English

如何使用基于多列值的条件语句对 dataframe 进行子集化

[英]How to subset a dataframe with a conditional statement based on multiple column values

I'm trying to subset a dataframe on the basis of conditions from multiple columns.我正在尝试根据多列的条件对 dataframe 进行子集化。 Here is my dataframe.这是我的 dataframe。

var1 <- c(x,x,x,y,y,z,z,z,z) 
var2 <- c(a,b,c,a,b,a,b,c,d) 
var3 <- c(2,4,1,4,1,6,2,5,8)
data1 <- data.frame(var1,var2,var3)
# -------------------------------------------------------------------------
#     var1 var2 var3
# 1    x    a    2
# 2    x    b    4
# 3    x    c    1
# 4    y    a    4
# 5    y    b    1
# 6    z    a    6
# 7    z    b    2
# 8    z    c    5
# 9    z    d    8

Output Output

The output I expect is:我期望的 output 是:

#     var1
# 1    y
# 2    z

Condition健康)状况

The following are the conditions leading to the output:以下是导致 output 的条件:

  1. The output is a dataframe where only values of var1 are selected. output 是 dataframe,其中仅选择了var1的值。
  2. Values of var3 where var2 is equal to a is greater than values of var3 where var2 is equal to b . var2等于avar3的值大于var2等于bvar3的值。

I'm unable to create a code based on this complicated condition from multiple columns.我无法根据多列的这种复杂条件创建代码。

Thank you.谢谢你。

This can give you a factor:这可以给你一个因素:

subset(data1, (var2=="a"))[subset(data1, (var2=="a"))$var3 > subset(data1, (var2=="b"))$var3, "var1"]

# [1] y z
# Levels: x y z

You can use data.frame to get what you want as follows:您可以使用data.frame来获取您想要的内容,如下所示:

data.frame(var1 = subset(data1, (var2=="a"))[subset(data1, (var2=="a"))$var3 > subset(data1, (var2=="b"))$var3, "var1"])
#   var1
# 1    y
# 2    z

The most intuitive solution might be to use a for-loop.最直观的解决方案可能是使用 for 循环。 Probably, there are shorter and more elegant ways to solve this problem, but this should work:可能有更短更优雅的方法来解决这个问题,但这应该有效:

selection <- c()

for(i in unique(var1)) {
  var_store <- data1 %>%
    filter(var1 == i, var2 == a | var2 == b)

  if(filter(var_store, var2 == a) %>% 
    select(var3) %>% 
    as.numeric() > 
  filter(var_store, var2 == b) %>% 
    select(var3) %>% 
    as.numeric()) {

    selection <- c(selection , unique(var_store$var1))
  }
}

data1 %>% 
  filter(var1 %in% selection)


# # A tibble: 6 x 3
#   var1  var2   var3
#   <chr> <chr> <dbl>
# 1 y     a         4
# 2 y     b         1
# 3 z     a         6
# 4 z     b         2
# 5 z     c         5
# 6 z     d         8

I found that reshaping the dataframe can solve my problem.我发现重塑 dataframe 可以解决我的问题。 I have been transposed var2 using dcast() to get the desired result我已使用 dcast() 转置 var2 以获得所需的结果

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM