I'm trying to remove specific numbers and characters from the column names in a data frame in R but am only able to remove the numbers, have tried different manners but still keep the characters at the end.
Each column is represented as letters and then a number in parenthesis; eg ASE (232)
Subject ASE (232) ASD (121) AFD (313)
1 1.1. 1.2 1.3
Subject ASE ASD AFD
1 1.1 1.2 1.3
colnames(data)<-gsub("[A-Z] ([0-9]+)","",colnames(data))
We may change the code to match one or more space ( \\s+
) followed by the opening parentheses ( \\(
, one or more digits ( \\d+
) and other characters ( .*
) and replace with blank ( ""
)
colnames(data) <- sub("\\s+\\(\\d+.*", "", colnames(data))
colnames(data)
[1] "Subject" "ASE" "ASD" "AFD"
Or another option is trimws
from base R
trimws(colnames(data), whitespace = "\\s+\\(.*")
[1] "Subject" "ASE" "ASD" "AFD"
In the OP's, code, it is matching an upper case letter followed by space and the (
is a metacharacter, which is not escaped. , thus in regex mode, it captures the digits ( ([0-9]+)
). But, this don't match the pattern in the column names, because after a space, there is a (
, which is not matched, thus it returns the same string
gsub("[A-Z] ([0-9]+)","",colnames(data))
[1] "Subject" "ASE (232)" "ASD (121)" "AFD (313)"
data <- structure(list(Subject = 1L, `ASE (232)` = "1.1.", `ASD (121)` = 1.2,
`AFD (313)` = 1.3), class = "data.frame", row.names = c(NA,
-1L))
You can do this:
sub("(\\w+).*", "\\1", colnames(data))
This uses backreference \\1
to "remember" any series of alphanumeric characters \\w+
and replaces the whole string in sub
's replacement argument with just that remembered bit.
We could use word
from stringr
package along with rename_with
:
library(stringr)
library(dplyr)
data %>%
rename_with(~word(., 1))
Subject ASE ASD AFD
1 1 1.1. 1.2 1.3
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.