I have data which looks like this:
*first* *last*
M a rk Twain
Hun ter Stockt on Thompson
The data then continues for n amount of rows. So I want the data to look like this:
*first* *last*
Mark Twain
Hunter Stockton Thompson
I know I can use gsub to remove all blankspaces like this:
gsub(" ", "", x, fixed = TRUE)
And I can identify the pattern with a regex like this:
( [AZ])
But how can I combine these two to say to gsub - remove all spaces but not in the cases where it matches the regex?
Simplest way:
txt <- c("M a rk", "Twain", "Hun ter", "Stockt on Thompson")
gsub("\\s([a-z])", "\\1", txt)
## [1] "Mark" "Twain" "Hunter" "Stockton Thompson"
If you want to apply this to more than one variable in a data.frame, you can do it using lapply and the list addressing replacement function for a data.frame. (Note: You really should not use asterisks in the names of data.frame columns.)
df <- data.frame("*first*" = c("M a rk", "Hun ter"),
"*last*" = c("Twain", "Stockt on Thompson"),
check.names = FALSE, stringsAsFactors = FALSE)
# names of the text columns you want to clean up
varsToModify <- c("*first*", "*last*")
df[varsToModify] <- lapply(df[varsToModify],
function(x) gsub("\\s([a-z])", "\\1", x))
df
## *first* *last*
## 1 Mark Twain
## 2 Hunter Stockton Thompson
df <- data.frame(`*first*`=c('M a rk','Hun ter'),`*last*`=c('Twain','Stockt on Thompson'),check.names=F,stringsAsFactors=F);
df;
## *first* *last*
## 1 M a rk Twain
## 2 Hun ter Stockt on Thompson
I would use a Perl negative lookahead assertion:
for (ci in seq_along(df)) df[[ci]] <- gsub(perl=T,' (?![A-Z])','',df[[ci]]);
df;
## *first* *last*
## 1 Mark Twain
## 2 Hunter Stockton Thompson
See Regular Expressions as used in R . The discussion of Perl assertions is given near the bottom of the page.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.