[英]`dplyr::select(num_range())` when the number is in middle of column name
In the data below, there are column names for units (1 - 8). 在下面的数据中,有单位(1-8)的列名。 Each unit has a column for the score and for the percent.
每个单位都有一个分数列和百分比列。 Is there a way to use
dplyr::select()
with the num_range()
helper to select, say, only units 1-3 for the scores? 有没有一种方法可以将
dplyr::select()
与num_range()
助手一起使用,以选择仅1-3的分数? I can get it if I drop the suffix (so it's just unit_1 instead of unit_1_score), but otherwise my attempts have been unsuccessful. 如果删除后缀,则可以得到它(所以它只是unit_1而不是unit_1_score),但是否则我的尝试失败了。 I've tried
dplyr::select(d, num_range("unit_", 1:3, "_score"))
but that doesn't seem to work. 我已经尝试过
dplyr::select(d, num_range("unit_", 1:3, "_score"))
但这似乎不起作用。 Any help would be appreciated. 任何帮助,将不胜感激。
d <- readr::read_csv("https://data.jacksonms.gov/api/views/97iy-g8hk/rows.csv")
d <- janitor::clean_names(d)
names(d)
[1] "test_year" "test_type" "test_site" "student_id"
[5] "pre_test_score" "pre_test_percent" "post_test_score" "post_test_percent"
[9] "percentage_change" "unit_1_score" "unit_1_percent" "unit_2_score"
[13] "unit_2_percent" "unit_3_score" "unit_3_percent" "unit_4_score"
[17] "unit_4_percent" "unit_5_6_score" "unit_5_6_percent" "unit_7_score"
[21] "unit_7_percent" "unit_8_score" "unit_8_percent" "total_score"
[25] "total_percent_correct"
我们可以使用dplyr::matches()
选择具有正则表达式范围的列:
select(d, matches("unit_[1-3]_score"))
I hope this answer is not perceived as off-topic; 我希望这个答案不会成为题外话; I am assuming you would be happy with a valid response even if it does not use
dplyr
. 我假设即使您不使用
dplyr
您也将对有效的响应感到满意。
You can easily select certain columns in a data.frame
using regular expressions. 您可以使用正则表达式轻松选择
data.frame
某些列。 To select only units 1-3, for example, try: d[, grep(x = colnames(d), pattern = "^unit_[1-3]{1}_.*$)"]
This will select the columns in d
that have column names starting with "unit_" followed by 1, 2, or 3 (only one time), and then zero or more of anything afterwards. 例如,要仅选择1-3单元,请尝试:
d[, grep(x = colnames(d), pattern = "^unit_[1-3]{1}_.*$)"]
这将选择列在d
,其列名以“ unit_”开头,后跟1、2或3(仅一次),之后为零或多个。
Notwithstanding that the 5_6
column is going to be tricky (who thought that was a good idea!?), you might find the new tidyeval concepts useful for this. 尽管
5_6
列会很棘手(谁认为这是个好主意!?),但是您可能会发现新的tidyeval概念对此很有用。 The syms
function in the rlang
package and the new !!!
rlang
软件包和新版!!!
的syms
函数!!!
expansion method work together to solve this kind of problem: 扩展方法共同解决此类问题:
dplyr::select(d, !!!rlang::syms(paste0("unit_", 1:3, "_score")))
#> # A tibble: 48 x 3
#> unit_1_score unit_2_score unit_3_score
#> <int> <int> <int>
#> 1 3 4 6
#> 2 5 5 6
#> 3 4 4 6
#> 4 4 4 6
#> 5 2 5 6
#> 6 5 5 7
#> 7 5 5 6
#> 8 4 5 5
#> 9 6 4 5
#> 10 4 5 5
#> # ... with 38 more rows
Explaining exactly what this does is somewhat tricky (try reading vignette("tidy-evaluation")
) but it works, so there's that :) 确切解释此操作有些棘手(尝试阅读
vignette("tidy-evaluation")
),但是它可以工作,所以就这样:)
Though actually, just using strings works now so maybe you don't need to bother? 尽管实际上,仅使用字符串现在就可以工作,所以也许您不必理会?
dplyr::select(d, paste0("unit_", 1:3, "_score"))
#> # A tibble: 48 x 3
#> unit_1_score unit_2_score unit_3_score
#> <int> <int> <int>
#> 1 3 4 6
#> 2 5 5 6
#> 3 4 4 6
#> 4 4 4 6
#> 5 2 5 6
#> 6 5 5 7
#> 7 5 5 6
#> 8 4 5 5
#> 9 6 4 5
#> 10 4 5 5
#> # ... with 38 more rows
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.