简体   繁体   English

gsub 命令替换以 R 中特定字母开头的单词

[英]gsub command to substitute a word starting with a specific letter in R

My question is what is the gsub command to substitute for a word starting with a specific letter.我的问题是用什么 gsub 命令来代替以特定字母开头的单词。 My main goal is to remove all URL's from a given text.我的主要目标是从给定文本中删除所有 URL。

For example, I have a text: "refer http://www.google.com for further details" .例如,我有一条文字: "refer http://www.google.com for further details" What I need to do is, transform the text to "refer for further details" .我需要做的是,将文本转换为"refer for further details" For this, essentially I need to write a gsub command something like below:为此,基本上我需要编写一个gsub命令,如下所示:

text <- "refer http://www.google.com for further details"

gsub("http", "", text)

however this removes only the part 'http' from the text.但是,这仅从文本中删除了“http”部分。 I need to remove the complete word starting with 'http'.我需要删除以“http”开头的完整单词。

some other commands that I tried:我尝试过的其他一些命令:

gsub('http..', "", text) # -->removes two letters more after 'http' (the number of dots specifies the number of letters'
gsub('^http', "", text)
gsub('/http', "", text)
gsub('\\\http', "", text)

All this didn't give any fruitful results.所有这一切都没有带来任何丰硕的成果。

Any help in this regard will be greatly appreciated.在这方面的任何帮助将不胜感激。

This is only a halfway answer:这只是半途而废的答案:

gsub("https?://.*?\\s", "", text)
# [1] "refer for further details"

Why is it a "halfway answer"?为什么是“半途而废”? It really only addresses a limited set of scenarios--those where a URL is always followed by a space.它实际上只解决了一组有限的场景——那些 URL 后面总是跟一个空格的场景。 However, if it encountered a URL followed immediately by a punctuation mark, it would not work.但是,如果它遇到一个 URL 后紧跟一个标点符号,它将不起作用。

Detecting URLs is a fairly common task.检测 URL 是一项相当常见的任务。 You should be able to find more detailed patterns by searching for something like "regex identify URL".您应该能够通过搜索“正则表达式识别 URL”之类的内容来找到更详细的模式。 Most likely, though, you'd need to modify it somewhat to work with R.不过,最有可能的是,您需要对其进行一些修改才能使用 R。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM