简体   繁体   English

从 data.frame 的列名中提取数值

[英]Extracting numerical values from the column names of a data.frame

I have data as follows:我的数据如下:

library(magrittr)
dat_I <- structure(list(`[0,25)` = c(0L, 2L, 252L, 3L, 34L, 0L, 2L, 65L, 
23L, 9L, 84L, 24L, 52L, 5L, 1L, 91L, 5L, 4L, 7L, 5L, 40L, 116L, 
12L), `[1000,1500)` = c(0L, 12L, 16L, 0L, 34L, 1L, 0L, 7L, 0L, 
0L, 2L, 0L, 4L, 11L, 1L, 0L, 0L, 6L, 8L, 0L, 2L, 8L, 0L), `[1500,1000000)` = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 7, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0), `[1500,3000)` = c(8L, 5L, 8L, 0L, 16L, 2L, 10L, 4L, 5L, 0L, 
4L, 3L, 0L, 6L, 4L, 0L, 49L, 7L, 6L, 0L, 1L, 2L, 0L), `[25,1000)` = c(0L, 
22L, 48L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 25L, 27L, 0L, 0L, 28L, 0L), `[25,1500)` = c(15L, 0L, 0L, 
0L, 0L, 0L, 23L, 0L, 23L, 0L, 0L, 25L, 0L, 0L, 0L, 0L, 5L, 0L, 
0L, 0L, 0L, 0L, 0L), `[25,250)` = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 42L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L), `[25,3000)` = c(0L, 0L, 0L, 33L, 0L, 0L, 0L, 0L, 0L, 63L, 
0L, 0L, 0L, 0L, 0L, 29L, 0L, 0L, 0L, 34L, 0L, 0L, 83L), `[25,500)` = c(0L, 
0L, 0L, 0L, 213L, 24L, 0L, 23L, 0L, 0L, 25L, 0L, 21L, 107L, 0L, 
0L, 0L, 0L, 0L, 0L, 23L, 0L, 0L), `[250,500)` = c(0L, 0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 2L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L), `[3000,1000000)` = c(2L, 1L, 1L, 7L, 1L, 0L, 
2L, 1L, 5L, 25L, 5L, 1L, 0L, 3L, 0L, 4L, 7L, 2L, 5L, 17L, 0L, 
5L, 19L), `[500,1000)` = c(0L, 0L, 0L, 0L, 122L, 9L, 0L, 11L, 
0L, 0L, 7L, 0L, 6L, 44L, 3L, 0L, 0L, 0L, 0L, 0L, 7L, 0L, 0L)), class = "data.frame", row.names = c("A", 
"B", "C", "D", 
"E", "F", "G", 
"H", "I", "J", "K", 
"L", "M", "N", 
"O", "P", "Q", 
"R", "S", "T", "U", 
"V", "W"))

dat_II <- structure(list(`[0,25)` = 5L, `[100,250)` = 43L, `[100,500)` = 0L, 
    `[1000,1000000]` = 20L, `[1000,1500)` = 0L, `[1500,3000)` = 0L, 
    `[25,100)` = 38L, `[25,50)` = 0L, `[250,500)` = 27L, `[3000,1000000]` = 0L, 
    `[50,100)` = 0L, `[500,1000)` = 44L, `[500,1000000]` = 0L), row.names = "Type_A", class = "data.frame")

I would like to apply the following code:我想应用以下代码:

s_ordered_II <- stringi::stri_extract_all_regex(colnames(dat_II), "[[:alpha:]]+") %>%
  unlist() %>% 
  unique() %>% 
  sort()

s_ordered_I <- stringi::stri_extract_all_regex(colnames(dat_I), "[[:alpha:]]+") %>%
  unlist() %>% 
  unique() %>% 
  sort()

For some reason it does not work although it did with similar code before .出于某种原因,尽管它以前使用过类似的代码,但它不起作用。 I do not understand why.我不懂为什么。

Could someone comment?有人可以评论吗?

You're using "[[:alpha:]]+" which will find all alphabeta characters (a combination of [:lower:] and [:upper:] ).您正在使用"[[:alpha:]]+"它将找到所有字母字符( [:lower:][:upper:]的组合)。 If you want numbers, you should be using "[[:digit:]]+" (or "[[:alnum:]]+" ) instead.如果你想要数字,你应该使用"[[:digit:]]+" (或"[[:alnum:]]+" )代替。 See ?regex for all of them but these two:请参阅?regex以了解除这两个以外的所有内容:

     '[:alpha:]' Alphabetic characters: '[:lower:]' and '[:upper:]'.
     '[:digit:]' Digits: '0 1 2 3 4 5 6 7 8 9'.

With that,接着就,随即,

stringi::stri_extract_all_regex(colnames(dat_II), "[[:digit:]]+") %>%
  unlist() %>% 
  unique() %>% 
  sort()
#  [1] "0"       "100"     "1000"    "1000000" "1500"    "25"      "250"     "3000"    "50"      "500"    

stringi::stri_extract_all_regex(colnames(dat_I), "[[:digit:]]+") %>%
  unlist() %>% 
  unique() %>% 
  sort()
# [1] "0"       "1000"    "1000000" "1500"    "25"      "250"     "3000"    "500"    

Though that does lose the pairing of (say) of [0,25) ...虽然这确实失去了[0,25)的(比如说)配对......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM