dplyr :: select（num_range（））当数字在列名的中间时

Question

In the data below, there are column names for units (1 - 8). 在下面的数据中，有单位（1-8）的列名。 Each unit has a column for the score and for the percent. 每个单位都有一个分数列和百分比列。 Is there a way to use dplyr::select() with the num_range() helper to select, say, only units 1-3 for the scores? 有没有一种方法可以将dplyr::select()与num_range()助手一起使用，以选择仅1-3的分数？ I can get it if I drop the suffix (so it's just unit_1 instead of unit_1_score), but otherwise my attempts have been unsuccessful. 如果删除后缀，则可以得到它（所以它只是unit_1而不是unit_1_score），但是否则我的尝试失败了。 I've tried dplyr::select(d, num_range("unit_", 1:3, "_score")) but that doesn't seem to work. 我已经尝试过dplyr::select(d, num_range("unit_", 1:3, "_score"))但这似乎不起作用。 Any help would be appreciated. 任何帮助，将不胜感激。

d <- readr::read_csv("https://data.jacksonms.gov/api/views/97iy-g8hk/rows.csv")
d <- janitor::clean_names(d)
names(d)

 [1] "test_year"             "test_type"             "test_site"             "student_id"           
 [5] "pre_test_score"        "pre_test_percent"      "post_test_score"       "post_test_percent"    
 [9] "percentage_change"     "unit_1_score"          "unit_1_percent"        "unit_2_score"         
 [13] "unit_2_percent"        "unit_3_score"          "unit_3_percent"        "unit_4_score"         
 [17] "unit_4_percent"        "unit_5_6_score"        "unit_5_6_percent"      "unit_7_score"         
 [21] "unit_7_percent"        "unit_8_score"          "unit_8_percent"        "total_score"          
 [25] "total_percent_correct"

Answer 1

我们可以使用dplyr::matches()选择具有正则表达式范围的列：

select(d, matches("unit_[1-3]_score"))

Answer 2

I hope this answer is not perceived as off-topic; 我希望这个答案不会成为题外话； I am assuming you would be happy with a valid response even if it does not use dplyr . 我假设即使您不使用dplyr您也将对有效的响应感到满意。

You can easily select certain columns in a data.frame using regular expressions. 您可以使用正则表达式轻松选择data.frame某些列。 To select only units 1-3, for example, try: d[, grep(x = colnames(d), pattern = "^unit_[1-3]{1}_.*$)"] This will select the columns in d that have column names starting with "unit_" followed by 1, 2, or 3 (only one time), and then zero or more of anything afterwards. 例如，要仅选择1-3单元，请尝试： d[, grep(x = colnames(d), pattern = "^unit_[1-3]{1}_.*$)"]这将选择列在d ，其列名以“ unit_”开头，后跟1、2或3（仅一次），之后为零或多个。

Answer 3

Notwithstanding that the 5_6 column is going to be tricky (who thought that was a good idea!?), you might find the new tidyeval concepts useful for this. 尽管5_6列会很棘手（谁认为这是个好主意！？），但是您可能会发现新的tidyeval概念对此很有用。 The syms function in the rlang package and the new !!! rlang软件包和新版!!!的syms函数!!! expansion method work together to solve this kind of problem: 扩展方法共同解决此类问题：

dplyr::select(d, !!!rlang::syms(paste0("unit_", 1:3, "_score")))
#> # A tibble: 48 x 3
#>    unit_1_score unit_2_score unit_3_score
#>           <int>        <int>        <int>
#>  1            3            4            6
#>  2            5            5            6
#>  3            4            4            6
#>  4            4            4            6
#>  5            2            5            6
#>  6            5            5            7
#>  7            5            5            6
#>  8            4            5            5
#>  9            6            4            5
#> 10            4            5            5
#> # ... with 38 more rows

Explaining exactly what this does is somewhat tricky (try reading vignette("tidy-evaluation") ) but it works, so there's that :) 确切解释此操作有些棘手（尝试阅读vignette("tidy-evaluation") ），但是它可以工作，所以就这样:)

Though actually, just using strings works now so maybe you don't need to bother? 尽管实际上，仅使用字符串现在就可以工作，所以也许您不必理会？

dplyr::select(d, paste0("unit_", 1:3, "_score"))
#> # A tibble: 48 x 3
#>    unit_1_score unit_2_score unit_3_score
#>           <int>        <int>        <int>
#>  1            3            4            6
#>  2            5            5            6
#>  3            4            4            6
#>  4            4            4            6
#>  5            2            5            6
#>  6            5            5            7
#>  7            5            5            6
#>  8            4            5            5
#>  9            6            4            5
#> 10            4            5            5
#> # ... with 38 more rows

dplyr :: select（num_range（））当数字在列名的中间时

问题描述

3 个解决方案

解决方案1
4 2017-09-21 19:13:17

解决方案2
2 2017-09-21 19:07:09

解决方案3
1 2017-09-21 19:22:09

dplyr :: select（num_range（））当数字在列名的中间时

问题描述

3 个解决方案

解决方案1 4 2017-09-21 19:13:17

解决方案2 2 2017-09-21 19:07:09

解决方案3 1 2017-09-21 19:22:09

解决方案1
4 2017-09-21 19:13:17

解决方案2
2 2017-09-21 19:07:09

解决方案3
1 2017-09-21 19:22:09