Select columns from dataframe start with number

Question

I have a data frame with column names with name start with numbers and names with string and I want to subset with names starting with numbers followed by dots.

this code is working for this sample but in my actual data frame the column AA ID get selected. I don't know the reason

df <- data.frame(`AA ID`=c(1,2,3,4,5,6,7,8,9,10),
                 "BB"=c("AMK","KAMl","HAJ","NHS","KUL","GAF","BGA","NHU","VGY","NHU"),
                 "CC"=c("TAMAN","GHUSI","KELVIN","DEREK","LOKU","MNDHUL","JASMIN","BINNY","BURTAM","DAVID"),
                 "DD"=c(62,41,37,41,32,74,52,75,59,36),
                 "EE"=c("CA","NY","GA","DE","MN","LA","GA","VA","TM","BA"),
                 "FF"=c("ENGLISH","FRENCH","ENGLISH","FRENCH","ENGLISH","ENGLISH","SPANISH","ENGLISH","SPANISH","RUSSIAN"),
                 "GG"=c(33,44,51,51,37,58,24,67,41,75),
                 `1A`=c("","D","","NA","","D","","","D",""),
                 `2B`=c("","A","","","A","A","A","A","",""),
                 `3C`=c("","","","","","","","","",""),
                 `4D`=c("","G","G","G","G","G","G","G","",""),
                 "Concatenate" = c("","DAG","G","NAG","AG","DAG","AG","AG","D",""))

df <- df %>% rename(`1. A`="X1A",`1. B`="X2B",`1. C`="X3C",`1. D`="X4D")
Error_summary <- select(df,matches("^[0-9]*\\."))

also I am trying to add count in data frames like below

df_row = 
  df %>% 
  summarize(across(c(matches("^[0-9]*\\."), Concatenate), ~ sum(!is.na(.) & . != "" & . != "NA")))

but this is also selecting column AA ID which i dont want to select.

Answer 1

Taking into account that your variables supposed to starting with numbers will be converted to variable names starting with X, you could do:

library(tidyverse)
df %>%
  select(matches("^X[0-9]"))

which gives:

   X1..A X2..B X3..C X4..D
1                         
2      D     A           G
3                        G
4     NA                 G
5            A           G
6      D     A           G
7            A           G
8            A           G
9      D                  
10

With the same logic you can do your counts:

df %>% 
  summarize(across(c(matches("^X[0-9]"), Concatenate), ~ sum(!is.na(.) & . != "" & . != "NA")))

which gives

  X1..A X2..B X3..C X4..D Concatenate
1     3     5     0     7           8

Although I'm not sure if you want to exclude the "NAG" value in the Concatenate column.

Select columns from dataframe start with number

Question

1 answers

solution1
0 ACCPTED 2020-12-28 11:41:47

Select columns from dataframe start with number

Question

1 answers

solution1 0 ACCPTED 2020-12-28 11:41:47

solution1
0 ACCPTED 2020-12-28 11:41:47