简体   繁体   中英

How to replace some characters after the last dot with their lower cases

A simple question but I cannot solve it. I got a string like this mail address:

ma <- "something@somewhere.COM"

My goal is to get:

"something@somewhere.com"

So to put the to lower cases the part after the last dot. I've read this , this , so I tried:

gsub(".*\\.","\\L\\1", ma, perl = T) 
[1] "COM" # nope

Also something like:

library(gsubfn)
options(gsubfn.engine = "R")
gsubfn(".*\\.", ~ tolower(x), ma)
[1] "something@somewhere.COM" # nope

I'm quite confused because it seems I can fetch the part I want to replace:

gsub(".*\\.","", ma)
[1] "COM"

But I cannot replace it properly. If you can give an explanation with the solution, I'll gladly appreciate it, regex is not my strongest feature.

\\\\L & \\\\U apply to a capturing group -- your first attempt is close but doesn't say which group to apply the command to:

ma <-"something@somewhere.COM"
gsub('(.*\\.)(.*)$', '\\1\\L\\2', ma, perl = TRUE)
# [1] "something@somewhere.com"

Note we capture two groups -- the part before (and including) the . , which we leave alone; and the part after the . , which we move to lower-case.

Also note it may be safer to use this regex to prevent any issues with greedy matching of . :

gsub('(.*\\.)([^.]*)$', '\\1\\L\\2', ma, perl = TRUE)
# [1] "something@somewhere.com"

If you want to apply that to the second group of an email address, you could use an email like pattern with 2 capturing groups and use \\\\L for the second group.

([^\s@]+@[^\s@]+\.)([^\s@]+)

For example

gsub("([^\\s@]+@[^\\s@]+\\.)([^\\s@]+)","\\1\\L\\2", "something@somewhere.COM", perl = T) 

Output

[1] "something@somewhere.com"

R demo | Regex demo

We can use sub to capture group and make use of \\\\L to change it to lowercase

sub("\\.(.*)$", ".\\L\\1", ma, perl = TRUE)
#[1] "something@somewhere.com"

The part after the @ in an email address is case insensitive so you could convert the entire part after @ to lower case without any problems.

We consider both cases.

Convert everything after last dot to lower case

To use gsubfn be sure that the regular expression is matching the extension.

Alternately we can make use of file_ext in the tools package (where the tools package comes with R so no installation is required).

A third approach is to use file_path_sans_ext together with file_ext (both from tools) and remove any trailing dot in case there was no extension. If we knew there is always an extension then the sub part can be omitted.

(Of course if we know that the part before the extension has no upper case characters or if we don't mind that it is converted to lower case we could just apply tolower to the entire input.)

s <- "something@somewhere.COM"

# 1
library(gsubfn)
gsubfn("\\.[^.]*$", tolower, s)
## [1] "something@somewhere.com"

# 2
library(tools)
ext <- file_ext(s)
sub(paste0(ext, "$"), tolower(ext), s)
## [1] "something@somewhere.com"

# 3
library(tools)
sub("\\.$", "", paste(file_path_sans_ext(s), tolower(file_ext(s)), sep = "."))
## [1] "something@somewhere.com"

Convert everything after @ to lower case

As mentioned earlier the domain, ie the portion of the string after the @ character, is case insensitive so we can convert that entire part to lower case. This is a simpler problem as we are guaranteed that there is only one instance of @. We use gsubfn in (4), extract the parts before and after the @ in (5) and use read.table without any regular expressions in (6).

s <- "something@somewhere.COM"

# 4
library(gsubfn)
gsubfn("@.*", tolower, s)
## [1] "something@somewhere.com"

# 5
paste(sub("@.*", "", s), tolower(sub(".*@", "", s)), sep = "@")
## [1] "something@somewhere.com"

# 6
with(read.table(text = s, sep = "@", as.is = TRUE), paste(V1, tolower(V2), sep = "@"))
## [1] "something@somewhere.com"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM