My data is structured as:
df <- data.frame(Athlete = c('02 Paul Jones', '02 Paul Jones', '02 Paul Jones', '02 Paul Jones',
'02 Paul Jones', '02 Paul Jones', '02 Paul Jones', '02 Paul Jones',
'01 Joe Smith', '01 Joe Smith', '01 Joe Smith', '01 Joe Smith',
'01 Joe Smith', '01 Joe Smith', '01 Joe Smith', '01 Joe Smith'),
Period = c('P1', 'P1', 'P1', 'P1',
'P2', 'P2', 'P2', 'P2',
'P1', 'P1', 'P1', 'P1',
'P2', 'P2', 'P2', 'P2'))
# Make `Athlete` column a character
df$Athlete <- as.character(df$Athlete)
How do I extract the first and last names of each athlete whilst keeping the space between first and last name? I do not want the leading space including either. For example, "Paul Jones"
not " Paul Jones"
remove all except alphabets [:alpha:]
and space characters [:space:]
using POSIX locale type interpretation of regular expression pattern.
df$Athlete <- as.character(df$Athlete) # convert factor to character
df$Athlete <- gsub("[^[:alpha:][:space:]]", '', df$Athlete)
df$Athlete <- gsub("^[[:space:]]+", '', df$Athlete ) # removing leading spaces
head(df)
# Athlete Period
# 1 Paul Jones P1
# 2 Paul Jones P1
# 3 Paul Jones P1
# 4 Paul Jones P1
# 5 Paul Jones P2
# 6 Paul Jones P2
We can use sub
to match one or more numbers ( [0-9]+
) followed by one or more space ( \\\\s+
) from the start ( ^
) of the string and replace it with ""
df$Athlete <- sub("^[0-9]+\\s+", "", df$Athlete)
df
# Athlete Period
#1 Paul Jones P1
#2 Paul Jones P1
#3 Paul Jones P1
#4 Paul Jones P1
#5 Paul Jones P2
#6 Paul Jones P2
#7 Paul Jones P2
#8 Paul Jones P2
#9 Joe Smith P1
#10 Joe Smith P1
#11 Joe Smith P1
#12 Joe Smith P1
#13 Joe Smith P2
#14 Joe Smith P2
#15 Joe Smith P2
#16 Joe Smith P2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.