简体   繁体   English

使用 R 中的替换字符串选择向量中的变量

[英]Selecting variables that are in a vector with a substitution string in R

I have this dataset:我有这个数据集:

df <- data.frame(kgs_chicken = c(0,1,2,1,2,3,0,1,2,8),
                 kgs_total = c(2,4,8,2,3,4,2,4,6,20),
                 price = c(0.81, 1.42, 2.85, 0.73, 1.07, 
                           1.52, 0.72, 1.42, 1.94, 7.44))

And I applied some transformations:我应用了一些转换:

df_trans <- df %>%
  mutate(ratio = kgs_chicken / kgs_total,
         kgs_chicken_ln = log(kgs_chicken - min(kgs_chicken) + 1),
         kgs_total_ln = log(kgs_total - min(kgs_total) + 1),
         ratio_price_kgs_total = price / kgs_total)

Then, after running an algorithm I am recommended to pick up some variables.然后,在运行算法后,建议我选择一些变量。 This algorithm return just the vector with the names of the variables (which are hardcoded here):这个算法只返回带有变量名称的向量(这里是硬编码的):

filter_vector <- c("kgs_chicken_ln", "kgs_total")

Ok, I want to select only the variables applying that vector, but if one of the elements of the vector has a "_ln" string, I want the variable without the "_ln".好的,我只想选择应用该向量的变量,但是如果向量的元素之一具有“_ln”字符串,则我想要没有“_ln”的变量。 I have tried this:我试过这个:

df %>%
  select(across(ends_with("_ln"), .fns = function (x) gsub("_ln","",names(x))))

But I get an error:但我收到一个错误:

Error: `across()` must only be used inside dplyr verbs.

The expected result is:预期的结果是:

   kgs_chicken kgs_total
1            0         2
2            1         4
3            2         8
4            1         2
5            2         3
6            3         4
7            0         2
8            1         4
9            2         6
10           8        20

Consider that I have a dataset with hundreds of variables so a solution could help me to automate that selection.考虑到我有一个包含数百个变量的数据集,因此解决方案可以帮助我自动进行选择。 Any help would be greatly appreciated.任何帮助将不胜感激。

We may use我们可能会使用

library(dplyr)
df %>% 
   select(starts_with(trimws(filter_vector, whitespace = "_.*")))
   kgs_chicken kgs_total
1            0         2
2            1         4
3            2         8
4            1         2
5            2         3
6            3         4
7            0         2
8            1         4
9            2         6
10           8        20

Will this work:这是否有效:

library(dplyr)
library(stringr)

df_trans %>% select(filter_vector) %>% 
       rename_at(vars(ends_with('_ln')), ~ str_remove(., '_ln'))
   kgs_chicken kgs_total
1    0.0000000         2
2    0.6931472         4
3    1.0986123         8
4    0.6931472         2
5    1.0986123         3
6    1.3862944         4
7    0.0000000         2
8    0.6931472         4
9    1.0986123         6
10   2.1972246        20

You may remove _ln string from the vector and select the column.您可以从向量中删除_ln字符串并选择列。

df[sub('_ln$', '', filter_vector)]

#   kgs_chicken kgs_total
#1            0         2
#2            1         4
#3            2         8
#4            1         2
#5            2         3
#6            3         4
#7            0         2
#8            1         4
#9            2         6
#10           8        20

In dplyr , you can use it within select -dplyr ,您可以在select使用它 -

library(dplyr)
df %>% select(sub('_ln$', '', filter_vector))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM