简体   繁体   English

用R中的初始@替换带有相同单词的单词

[英]Substitute word with same word without initial @ in R

I am trying to do a dataframe string substitution in R. I need to find all the words preceded by '@' (without space, eg @word) and change the '@' for '!' 我试图在R中进行数据帧字符串替换。我需要找到前面带有'@'的所有单词(没有空格,例如@word)并将'@'更改为'!' (eg from @word to !word). (例如从@word到!word)。 At the same time, it leaves intact the other instances of '@' (eg @ or @@ or @[@]). 同时,它保留了'@'的其他实例(例如@或@@或@ [@])。 For example, this is my original dataframe (to change: @def, @jkl, @stu): 例如,这是我的原始数据框(要更改:@ def,@ jkl,@ stu):

> df = data.frame(number = 1:4, text = c('abc @def ghi', '@jkl @ mno', '@[@] pqr @stu', 'vwx @@@ yz'))
> df
  number          text
1      1  abc @def ghi
2      2    @jkl @ mno
3      3 @[@] pqr @stu
4      4    vwx @@@ yz

And this is what I need it to look like: 这就是我需要的样子:

> df_result = data.frame(number = 1:4, text = c('abc !def ghi', '!jkl @ mno', '@[@] pqr !stu', 'vwx @@@ yz'))
> df_result
  number          text
1      1  abc !def ghi
2      2    !jkl @ mno
3      3 @[@] pqr !stu
4      4    vwx @@@ yz

I have tried with 我试过了

> gsub('@.+[a-z] ', '!', df$text)
[1] "abc !ghi"   "!@ mno"     "!@stu"      "vwx @@@ yz"

But the result is not the desired one. 但结果并不是理想的结果。 Any help is much appreciated. 任何帮助深表感谢。

Thank you. 谢谢。

How about 怎么样

gsub("(^| )@(\\w)", "\\1!\\2", df$text)
# [1] "abc !def ghi"  "!jkl @ mno"    "@[@] pqr !stu" "vwx @@@ yz"  

This matches an @ symbol at beginning of a string, or after a space. 这匹配字符串开头的@符号或空格后的符号。 Then, we capture the word character after the @ symbol, and replace @ with ! 然后,我们捕获@符号后的单词字符,并替换@ ! .

Explanation courtesy of regex101.com : 解释由regex101.com提供

  • (^| ) is the 1st Capturing Group; (^| )是第一个捕获组; ^ asserts position at start of the string; ^断言字符串开头的位置; | denotes "or"; 表示“或”; blank space matches the space character literally 空格与字面上的空格字符相匹配
  • @ matches the character @ literally (case sensitive) @匹配字符@字面(区分大小写)
  • (\\\\w) is the 2nd Capturing Group, it denotes a word character (\\\\w)是第二个捕获组,它表示一个单词字符

The replacement string \\\\1!\\\\2 replaces the regular expression match with the first capturing group ( \\\\1 ), followed by ! 替换字符串\\\\1!\\\\2将正则表达式匹配替换为第一个捕获组( \\\\1 ),然后是! , followed by the second capturing group ( \\\\2 ). ,然后是第二个捕获组( \\\\2 )。

You can use a positive lookahead (?=...) 你可以使用积极的前瞻(?=...)

gsub("@(?=[A-Za-z])", "!", df$text, perl = TRUE)
[1] "abc !def ghi"  "!jkl @ mno"    "@[@] pqr !stu" "vwx @@@ yz"  

From the "Regular Expressions as used in R" documentation page: 从“R中使用的正则表达式”文档页面:

Patterns (?=...) and (?!...) are zero-width positive and negative lookahead assertions: they match if an attempt to match the ... forward from the current position would succeed (or not), but use up no characters in the string being processed. 模式(?= ...)和(?!...)是零宽度正和负前瞻断言:如果尝试匹配当前位置的...前进(或不成功),它们会匹配,但是在正在处理的字符串中不使用任何字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM