简体   繁体   English

dplyr :: select(num_range())当数字在列名的中间时

[英]`dplyr::select(num_range())` when the number is in middle of column name

In the data below, there are column names for units (1 - 8). 在下面的数据中,有单位(1-8)的列名。 Each unit has a column for the score and for the percent. 每个单位都有一个分数列和百分比列。 Is there a way to use dplyr::select() with the num_range() helper to select, say, only units 1-3 for the scores? 有没有一种方法可以将dplyr::select()num_range()助手一起使用,以选择仅1-3的分数? I can get it if I drop the suffix (so it's just unit_1 instead of unit_1_score), but otherwise my attempts have been unsuccessful. 如果删除后缀,则可以得到它(所以它只是unit_1而不是unit_1_score),但是否则我的尝试失败了。 I've tried dplyr::select(d, num_range("unit_", 1:3, "_score")) but that doesn't seem to work. 我已经尝试过dplyr::select(d, num_range("unit_", 1:3, "_score"))但这似乎不起作用。 Any help would be appreciated. 任何帮助,将不胜感激。

d <- readr::read_csv("https://data.jacksonms.gov/api/views/97iy-g8hk/rows.csv")
d <- janitor::clean_names(d)
names(d)

 [1] "test_year"             "test_type"             "test_site"             "student_id"           
 [5] "pre_test_score"        "pre_test_percent"      "post_test_score"       "post_test_percent"    
 [9] "percentage_change"     "unit_1_score"          "unit_1_percent"        "unit_2_score"         
 [13] "unit_2_percent"        "unit_3_score"          "unit_3_percent"        "unit_4_score"         
 [17] "unit_4_percent"        "unit_5_6_score"        "unit_5_6_percent"      "unit_7_score"         
 [21] "unit_7_percent"        "unit_8_score"          "unit_8_percent"        "total_score"          
 [25] "total_percent_correct"

我们可以使用dplyr::matches()选择具有正则表达式范围的列:

select(d, matches("unit_[1-3]_score"))

I hope this answer is not perceived as off-topic; 我希望这个答案不会成为题外话; I am assuming you would be happy with a valid response even if it does not use dplyr . 我假设即使您不使用dplyr您也将对有效的响应感到满意。

You can easily select certain columns in a data.frame using regular expressions. 您可以使用正则表达式轻松选择data.frame某些列。 To select only units 1-3, for example, try: d[, grep(x = colnames(d), pattern = "^unit_[1-3]{1}_.*$)"] This will select the columns in d that have column names starting with "unit_" followed by 1, 2, or 3 (only one time), and then zero or more of anything afterwards. 例如,要仅选择1-3单元,请尝试: d[, grep(x = colnames(d), pattern = "^unit_[1-3]{1}_.*$)"]这将选择列在d ,其列名以“ unit_”开头,后跟1、2或3(仅一次),之后为零或多个。

Notwithstanding that the 5_6 column is going to be tricky (who thought that was a good idea!?), you might find the new tidyeval concepts useful for this. 尽管5_6列会很棘手(谁认为这是个好主意!?),但是您可能会发现新的tidyeval概念对此很有用。 The syms function in the rlang package and the new !!! rlang软件包和新版!!!syms函数!!! expansion method work together to solve this kind of problem: 扩展方法共同解决此类问题:

dplyr::select(d, !!!rlang::syms(paste0("unit_", 1:3, "_score")))
#> # A tibble: 48 x 3
#>    unit_1_score unit_2_score unit_3_score
#>           <int>        <int>        <int>
#>  1            3            4            6
#>  2            5            5            6
#>  3            4            4            6
#>  4            4            4            6
#>  5            2            5            6
#>  6            5            5            7
#>  7            5            5            6
#>  8            4            5            5
#>  9            6            4            5
#> 10            4            5            5
#> # ... with 38 more rows

Explaining exactly what this does is somewhat tricky (try reading vignette("tidy-evaluation") ) but it works, so there's that :) 确切解释此操作有些棘手(尝试阅读vignette("tidy-evaluation") ),但是它可以工作,所以就这样:)

Though actually, just using strings works now so maybe you don't need to bother? 尽管实际上,仅使用字符串现在就可以工作,所以也许您不必理会?

dplyr::select(d, paste0("unit_", 1:3, "_score"))
#> # A tibble: 48 x 3
#>    unit_1_score unit_2_score unit_3_score
#>           <int>        <int>        <int>
#>  1            3            4            6
#>  2            5            5            6
#>  3            4            4            6
#>  4            4            4            6
#>  5            2            5            6
#>  6            5            5            7
#>  7            5            5            6
#>  8            4            5            5
#>  9            6            4            5
#> 10            4            5            5
#> # ... with 38 more rows

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 num_range 选择在一个特定列中都包含相同前 4 位数字的行? (希望使用 dplyr/tidyverse) - How do I use num_range to select rows which all contain the same first 4 digits in one specific column? (hoping to use dplyr/tidyverse) 当它们在 dplyr() 中有 underline_number 时对列名进行排序? - Sort column name when they have underline_number in dplyr()? 使用 dplyr 根据列值范围选择列 - Select columns based on column value range with dplyr R dplyr 如何按列号而不是列名汇总 select 变量 - R dplyr how to select variables by column number rather than column name with summarise dplyr“选择” - 错误:找到重复的列名称 - dplyr “Select” - Error: found duplicated column name 使用DPLYR计算每一列的数值范围内的所有值 - Counting all values in number range for each column with DPLYR R dplyr 过滤器列,列名以数字开头 - R dplyr filter column with column name that starts with number dplyr :: select-使用字符变量作为列名以编程方式重新排列列 - dplyr::select - programatically rearrange columns using character variable for column name R dplyr 删除可能存在或不存在的列 select(-name) - R dplyr drop column that may or may not exist select(-name) 我应该将 select(&quot;column_name&quot;) 还是 select(column_name) 与 dplyr 一起使用? - Should I use select("column_name") or select(column_name) with dplyr?
相关标签
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM