用R中的初始@替换带有相同单词的单词

Question

I am trying to do a dataframe string substitution in R. I need to find all the words preceded by '@' (without space, eg @word) and change the '@' for '!' 我试图在R中进行数据帧字符串替换。我需要找到前面带有'@'的所有单词（没有空格，例如@word）并将'@'更改为'！' (eg from @word to !word). （例如从@word到！word）。 At the same time, it leaves intact the other instances of '@' (eg @ or @@ or @[@]). 同时，它保留了'@'的其他实例（例如@或@@或@ [@]）。 For example, this is my original dataframe (to change: @def, @jkl, @stu): 例如，这是我的原始数据框（要更改：@ def，@ jkl，@ stu）：

> df = data.frame(number = 1:4, text = c('abc @def ghi', '@jkl @ mno', '@[@] pqr @stu', 'vwx @@@ yz'))
> df
  number          text
1      1  abc @def ghi
2      2    @jkl @ mno
3      3 @[@] pqr @stu
4      4    vwx @@@ yz

And this is what I need it to look like: 这就是我需要的样子：

> df_result = data.frame(number = 1:4, text = c('abc !def ghi', '!jkl @ mno', '@[@] pqr !stu', 'vwx @@@ yz'))
> df_result
  number          text
1      1  abc !def ghi
2      2    !jkl @ mno
3      3 @[@] pqr !stu
4      4    vwx @@@ yz

I have tried with 我试过了

> gsub('@.+[a-z] ', '!', df$text)
[1] "abc !ghi"   "!@ mno"     "!@stu"      "vwx @@@ yz"

But the result is not the desired one. 但结果并不是理想的结果。 Any help is much appreciated. 任何帮助深表感谢。

Thank you. 谢谢。

Answer 1

How about 怎么样

gsub("(^| )@(\\w)", "\\1!\\2", df$text)
# [1] "abc !def ghi"  "!jkl @ mno"    "@[@] pqr !stu" "vwx @@@ yz"

This matches an @ symbol at beginning of a string, or after a space. 这匹配字符串开头的@符号或空格后的符号。 Then, we capture the word character after the @ symbol, and replace @ with ! 然后，我们捕获@符号后的单词字符，并替换@ ! . 。

Explanation courtesy of regex101.com : 解释由regex101.com提供：

(^| ) is the 1st Capturing Group; (^| )是第一个捕获组; ^ asserts position at start of the string; ^断言字符串开头的位置; | denotes "or"; 表示“或”; blank space matches the space character literally 空格与字面上的空格字符相匹配
@ matches the character @ literally (case sensitive) @匹配字符@字面（区分大小写）
(\\\\w) is the 2nd Capturing Group, it denotes a word character (\\\\w)是第二个捕获组，它表示一个单词字符

The replacement string \\\\1!\\\\2 replaces the regular expression match with the first capturing group ( \\\\1 ), followed by ! 替换字符串\\\\1!\\\\2将正则表达式匹配替换为第一个捕获组（ \\\\1 ），然后是! , followed by the second capturing group ( \\\\2 ). ，然后是第二个捕获组（ \\\\2 ）。

Answer 2

You can use a positive lookahead (?=...) 你可以使用积极的前瞻(?=...)

gsub("@(?=[A-Za-z])", "!", df$text, perl = TRUE)
[1] "abc !def ghi"  "!jkl @ mno"    "@[@] pqr !stu" "vwx @@@ yz"

From the "Regular Expressions as used in R" documentation page: 从“R中使用的正则表达式”文档页面：

Patterns (?=...) and (?!...) are zero-width positive and negative lookahead assertions: they match if an attempt to match the ... forward from the current position would succeed (or not), but use up no characters in the string being processed. 模式（？= ...）和（？！...）是零宽度正和负前瞻断言：如果尝试匹配当前位置的...前进（或不成功），它们会匹配，但是在正在处理的字符串中不使用任何字符。

用R中的初始@替换带有相同单词的单词

问题描述

2 个解决方案

解决方案1
3 已采纳 2017-05-09 01:34:08

解决方案2
3 2017-05-09 02:07:30

用R中的初始@替换带有相同单词的单词

问题描述

2 个解决方案

解决方案1 3 已采纳 2017-05-09 01:34:08

解决方案2 3 2017-05-09 02:07:30

解决方案1
3 已采纳 2017-05-09 01:34:08

解决方案2
3 2017-05-09 02:07:30