简体   繁体   中英

Select columns from dataframe start with number

I have a data frame with column names with name start with numbers and names with string and I want to subset with names starting with numbers followed by dots.

this code is working for this sample but in my actual data frame the column AA ID get selected. I don't know the reason

df <- data.frame(`AA ID`=c(1,2,3,4,5,6,7,8,9,10),
                 "BB"=c("AMK","KAMl","HAJ","NHS","KUL","GAF","BGA","NHU","VGY","NHU"),
                 "CC"=c("TAMAN","GHUSI","KELVIN","DEREK","LOKU","MNDHUL","JASMIN","BINNY","BURTAM","DAVID"),
                 "DD"=c(62,41,37,41,32,74,52,75,59,36),
                 "EE"=c("CA","NY","GA","DE","MN","LA","GA","VA","TM","BA"),
                 "FF"=c("ENGLISH","FRENCH","ENGLISH","FRENCH","ENGLISH","ENGLISH","SPANISH","ENGLISH","SPANISH","RUSSIAN"),
                 "GG"=c(33,44,51,51,37,58,24,67,41,75),
                 `1A`=c("","D","","NA","","D","","","D",""),
                 `2B`=c("","A","","","A","A","A","A","",""),
                 `3C`=c("","","","","","","","","",""),
                 `4D`=c("","G","G","G","G","G","G","G","",""),
                 "Concatenate" = c("","DAG","G","NAG","AG","DAG","AG","AG","D",""))

df <- df %>% rename(`1. A`="X1A",`1. B`="X2B",`1. C`="X3C",`1. D`="X4D")
Error_summary <- select(df,matches("^[0-9]*\\."))

also I am trying to add count in data frames like below

df_row = 
  df %>% 
  summarize(across(c(matches("^[0-9]*\\."), Concatenate), ~ sum(!is.na(.) & . != "" & . != "NA")))

but this is also selecting column AA ID which i dont want to select.

Taking into account that your variables supposed to starting with numbers will be converted to variable names starting with X, you could do:

library(tidyverse)
df %>%
  select(matches("^X[0-9]"))

which gives:

   X1..A X2..B X3..C X4..D
1                         
2      D     A           G
3                        G
4     NA                 G
5            A           G
6      D     A           G
7            A           G
8            A           G
9      D                  
10                        

With the same logic you can do your counts:

df %>% 
  summarize(across(c(matches("^X[0-9]"), Concatenate), ~ sum(!is.na(.) & . != "" & . != "NA")))

which gives

  X1..A X2..B X3..C X4..D Concatenate
1     3     5     0     7           8

Although I'm not sure if you want to exclude the "NAG" value in the Concatenate column.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM