简体   繁体   中英

gsub command to substitute a word starting with a specific letter in R

My question is what is the gsub command to substitute for a word starting with a specific letter. My main goal is to remove all URL's from a given text.

For example, I have a text: "refer http://www.google.com for further details" . What I need to do is, transform the text to "refer for further details" . For this, essentially I need to write a gsub command something like below:

text <- "refer http://www.google.com for further details"

gsub("http", "", text)

however this removes only the part 'http' from the text. I need to remove the complete word starting with 'http'.

some other commands that I tried:

gsub('http..', "", text) # -->removes two letters more after 'http' (the number of dots specifies the number of letters'
gsub('^http', "", text)
gsub('/http', "", text)
gsub('\\\http', "", text)

All this didn't give any fruitful results.

Any help in this regard will be greatly appreciated.

This is only a halfway answer:

gsub("https?://.*?\\s", "", text)
# [1] "refer for further details"

Why is it a "halfway answer"? It really only addresses a limited set of scenarios--those where a URL is always followed by a space. However, if it encountered a URL followed immediately by a punctuation mark, it would not work.

Detecting URLs is a fairly common task. You should be able to find more detailed patterns by searching for something like "regex identify URL". Most likely, though, you'd need to modify it somewhat to work with R.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM