[英]Substitute word with same word without initial @ in R
I am trying to do a dataframe string substitution in R. I need to find all the words preceded by '@' (without space, eg @word) and change the '@' for '!' 我试图在R中进行数据帧字符串替换。我需要找到前面带有'@'的所有单词(没有空格,例如@word)并将'@'更改为'!' (eg from @word to !word). (例如从@word到!word)。 At the same time, it leaves intact the other instances of '@' (eg @ or @@ or @[@]). 同时,它保留了'@'的其他实例(例如@或@@或@ [@])。 For example, this is my original dataframe (to change: @def, @jkl, @stu): 例如,这是我的原始数据框(要更改:@ def,@ jkl,@ stu):
> df = data.frame(number = 1:4, text = c('abc @def ghi', '@jkl @ mno', '@[@] pqr @stu', 'vwx @@@ yz'))
> df
number text
1 1 abc @def ghi
2 2 @jkl @ mno
3 3 @[@] pqr @stu
4 4 vwx @@@ yz
And this is what I need it to look like: 这就是我需要的样子:
> df_result = data.frame(number = 1:4, text = c('abc !def ghi', '!jkl @ mno', '@[@] pqr !stu', 'vwx @@@ yz'))
> df_result
number text
1 1 abc !def ghi
2 2 !jkl @ mno
3 3 @[@] pqr !stu
4 4 vwx @@@ yz
I have tried with 我试过了
> gsub('@.+[a-z] ', '!', df$text)
[1] "abc !ghi" "!@ mno" "!@stu" "vwx @@@ yz"
But the result is not the desired one. 但结果并不是理想的结果。 Any help is much appreciated. 任何帮助深表感谢。
Thank you. 谢谢。
How about 怎么样
gsub("(^| )@(\\w)", "\\1!\\2", df$text)
# [1] "abc !def ghi" "!jkl @ mno" "@[@] pqr !stu" "vwx @@@ yz"
This matches an @
symbol at beginning of a string, or after a space. 这匹配字符串开头的@
符号或空格后的符号。 Then, we capture the word character after the @
symbol, and replace @
with !
然后,我们捕获@
符号后的单词字符,并替换@
!
. 。
Explanation courtesy of regex101.com : 解释由regex101.com提供 :
(^| )
is the 1st Capturing Group; (^| )
是第一个捕获组; ^
asserts position at start of the string; ^
断言字符串开头的位置; |
denotes "or"; 表示“或”; blank space matches the space character literally 空格与字面上的空格字符相匹配 @
matches the character @
literally (case sensitive) @
匹配字符@
字面(区分大小写) (\\\\w)
is the 2nd Capturing Group, it denotes a word character (\\\\w)
是第二个捕获组,它表示一个单词字符 The replacement string \\\\1!\\\\2
replaces the regular expression match with the first capturing group ( \\\\1
), followed by !
替换字符串\\\\1!\\\\2
将正则表达式匹配替换为第一个捕获组( \\\\1
),然后是!
, followed by the second capturing group ( \\\\2
). ,然后是第二个捕获组( \\\\2
)。
You can use a positive lookahead (?=...)
你可以使用积极的前瞻(?=...)
gsub("@(?=[A-Za-z])", "!", df$text, perl = TRUE)
[1] "abc !def ghi" "!jkl @ mno" "@[@] pqr !stu" "vwx @@@ yz"
From the "Regular Expressions as used in R" documentation page: 从“R中使用的正则表达式”文档页面:
Patterns (?=...) and (?!...) are zero-width positive and negative lookahead assertions: they match if an attempt to match the ... forward from the current position would succeed (or not), but use up no characters in the string being processed. 模式(?= ...)和(?!...)是零宽度正和负前瞻断言:如果尝试匹配当前位置的...前进(或不成功),它们会匹配,但是在正在处理的字符串中不使用任何字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.