简体   繁体   English

通过多列模式有效过滤行

[英]efficient filter rows by multi column patterns

I have a big data frame (104029 x 142). 我有一个大数据框(104029 x 142)。

I want to filter rows which value>0 by multi specific column names. 我想通过多个特定的列名称来筛选value>0的行。

df
         word abrasive abrasives abrasivefree abrasion slurry solute solution ....
1 composition     -0.2       0.2         -0.3    -0.40    0.2      0.1         0.20 ....
2       ceria      0.1       0.2         -0.4    -0.20   -0.1     -0.2         0.20 ....
3     diamond      0.3      -0.5         -0.6    -0.10   -0.1     -0.2        -0.15 ....
4        acid     -0.1      -0.1         -0.2    -0.15    0.1      0.3         0.20 ....
....

Now I have tried to use filter() function to do, and it's OK. 现在,我尝试使用filter()函数执行操作,这没关系。

But I think this way is not efficient for me. 但是我认为这种方式对我而言并不有效。

Because I need to define each column name, it makes hard work when I need to maintain my process. 因为我需要定义每个列的名称,所以在需要维护过程时会很费力。

column_names <- c("agent", "agents", "liquid", "liquids", "slurry", 
                  "solute", "solutes", "solution", "solutions")

df_filter <- filter(df,  agents>0 | agents>0 | liquid>0 | liquids>0 | slurry>0 | solute>0 | 
                    solutes>0 | solution>0 | solutions>0)

df_filter
         word abrasive abrasives abrasivefree abrasion  slurry solute solution ....
1 composition     -0.2       0.2         -0.3    -0.40    0.2      0.1         0.20 ....
2       ceria      0.1       0.2         -0.4    -0.20   -0.1     -0.2         0.20 ....
4        acid     -0.1      -0.1         -0.2    -0.15    0.1      0.3         0.20 ....
....

Is there any more efficient way to do? 有没有更有效的方法?

This line will return vector of True/False for the condition you are testing 该行将针对您正在测试的条件返回True / False的向量

filter_condition <- apply(df[ , column_names], 1, function(x){sum(x>0)} )>0

Then you can use 那你可以用

df[filter_condition, ]

I'm sure there is something nicer in dplyr. 我敢肯定dplyr中有更好的东西。

Use dplyr::filter_at() which allows you to use select() -style helpers to select some functions: 使用dplyr::filter_at() ,它允许您使用select()风格的助手来选择一些功能:

library(dplyr)

df_filter <- df %>%
    filter_at(
        # select all the columns that are in your column_names vector
        vars(one_of(column_names))
        # if any of those variables are greater than zero, keep the row
        , any_vars( . > 0)
    )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM