[英]efficient filter rows by multi column patterns
I have a big data frame (104029 x 142). 我有一个大数据框(104029 x 142)。
I want to filter rows which value>0
by multi specific column names. 我想通过多个特定的列名称来筛选value>0
的行。
df
word abrasive abrasives abrasivefree abrasion slurry solute solution ....
1 composition -0.2 0.2 -0.3 -0.40 0.2 0.1 0.20 ....
2 ceria 0.1 0.2 -0.4 -0.20 -0.1 -0.2 0.20 ....
3 diamond 0.3 -0.5 -0.6 -0.10 -0.1 -0.2 -0.15 ....
4 acid -0.1 -0.1 -0.2 -0.15 0.1 0.3 0.20 ....
....
Now I have tried to use filter()
function to do, and it's OK. 现在,我尝试使用filter()
函数执行操作,这没关系。
But I think this way is not efficient for me. 但是我认为这种方式对我而言并不有效。
Because I need to define each column name, it makes hard work when I need to maintain my process. 因为我需要定义每个列的名称,所以在需要维护过程时会很费力。
column_names <- c("agent", "agents", "liquid", "liquids", "slurry",
"solute", "solutes", "solution", "solutions")
df_filter <- filter(df, agents>0 | agents>0 | liquid>0 | liquids>0 | slurry>0 | solute>0 |
solutes>0 | solution>0 | solutions>0)
df_filter
word abrasive abrasives abrasivefree abrasion slurry solute solution ....
1 composition -0.2 0.2 -0.3 -0.40 0.2 0.1 0.20 ....
2 ceria 0.1 0.2 -0.4 -0.20 -0.1 -0.2 0.20 ....
4 acid -0.1 -0.1 -0.2 -0.15 0.1 0.3 0.20 ....
....
Is there any more efficient way to do? 有没有更有效的方法?
This line will return vector of True/False for the condition you are testing 该行将针对您正在测试的条件返回True / False的向量
filter_condition <- apply(df[ , column_names], 1, function(x){sum(x>0)} )>0
Then you can use 那你可以用
df[filter_condition, ]
I'm sure there is something nicer in dplyr. 我敢肯定dplyr中有更好的东西。
Use dplyr::filter_at()
which allows you to use select()
-style helpers to select some functions: 使用dplyr::filter_at()
,它允许您使用select()
风格的助手来选择一些功能:
library(dplyr)
df_filter <- df %>%
filter_at(
# select all the columns that are in your column_names vector
vars(one_of(column_names))
# if any of those variables are greater than zero, keep the row
, any_vars( . > 0)
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.