使用 R 中的替换字符串选择向量中的变量

Question

I have this dataset:我有这个数据集：

df <- data.frame(kgs_chicken = c(0,1,2,1,2,3,0,1,2,8),
                 kgs_total = c(2,4,8,2,3,4,2,4,6,20),
                 price = c(0.81, 1.42, 2.85, 0.73, 1.07, 
                           1.52, 0.72, 1.42, 1.94, 7.44))

And I applied some transformations:我应用了一些转换：

df_trans <- df %>%
  mutate(ratio = kgs_chicken / kgs_total,
         kgs_chicken_ln = log(kgs_chicken - min(kgs_chicken) + 1),
         kgs_total_ln = log(kgs_total - min(kgs_total) + 1),
         ratio_price_kgs_total = price / kgs_total)

Then, after running an algorithm I am recommended to pick up some variables.然后，在运行算法后，建议我选择一些变量。 This algorithm return just the vector with the names of the variables (which are hardcoded here):这个算法只返回带有变量名称的向量（这里是硬编码的）：

filter_vector <- c("kgs_chicken_ln", "kgs_total")

Ok, I want to select only the variables applying that vector, but if one of the elements of the vector has a "_ln" string, I want the variable without the "_ln".好的，我只想选择应用该向量的变量，但是如果向量的元素之一具有“_ln”字符串，则我想要没有“_ln”的变量。 I have tried this:我试过这个：

df %>%
  select(across(ends_with("_ln"), .fns = function (x) gsub("_ln","",names(x))))

But I get an error:但我收到一个错误：

Error: `across()` must only be used inside dplyr verbs.

The expected result is:预期的结果是：

   kgs_chicken kgs_total
1            0         2
2            1         4
3            2         8
4            1         2
5            2         3
6            3         4
7            0         2
8            1         4
9            2         6
10           8        20

Consider that I have a dataset with hundreds of variables so a solution could help me to automate that selection.考虑到我有一个包含数百个变量的数据集，因此解决方案可以帮助我自动进行选择。 Any help would be greatly appreciated.任何帮助将不胜感激。

Answer 1

We may use我们可能会使用

library(dplyr)
df %>% 
   select(starts_with(trimws(filter_vector, whitespace = "_.*")))
   kgs_chicken kgs_total
1            0         2
2            1         4
3            2         8
4            1         2
5            2         3
6            3         4
7            0         2
8            1         4
9            2         6
10           8        20

Answer 2

Will this work:这是否有效：

library(dplyr)
library(stringr)

df_trans %>% select(filter_vector) %>% 
       rename_at(vars(ends_with('_ln')), ~ str_remove(., '_ln'))
   kgs_chicken kgs_total
1    0.0000000         2
2    0.6931472         4
3    1.0986123         8
4    0.6931472         2
5    1.0986123         3
6    1.3862944         4
7    0.0000000         2
8    0.6931472         4
9    1.0986123         6
10   2.1972246        20

Answer 3

You may remove _ln string from the vector and select the column.您可以从向量中删除_ln字符串并选择列。

df[sub('_ln$', '', filter_vector)]

#   kgs_chicken kgs_total
#1            0         2
#2            1         4
#3            2         8
#4            1         2
#5            2         3
#6            3         4
#7            0         2
#8            1         4
#9            2         6
#10           8        20

In dplyr , you can use it within select -在dplyr ，您可以在select使用它 -

library(dplyr)
df %>% select(sub('_ln$', '', filter_vector))

使用 R 中的替换字符串选择向量中的变量

问题描述

3 个解决方案

解决方案1
1 2021-10-22 15:49:35

解决方案2
0 2021-10-22 04:29:00

解决方案3
0 2021-10-22 04:35:48

使用 R 中的替换字符串选择向量中的变量

问题描述

3 个解决方案

解决方案1 1 2021-10-22 15:49:35

解决方案2 0 2021-10-22 04:29:00

解决方案3 0 2021-10-22 04:35:48

解决方案1
1 2021-10-22 15:49:35

解决方案2
0 2021-10-22 04:29:00

解决方案3
0 2021-10-22 04:35:48