[英]Is there anyone can interpret the results about basic regular expression using R?
1. 1.
test1
# [1] "abcd" "abc1" "ab2b" "a3cd" "4bcd"
test1[grep('[^abc1-3]', test1)]
# [1] "abcd" "a3cd" "4bcd"
test1[grep('[^a-d1-3]', test1)]
# [1] "4bcd"
test1[grep('[^4]', test1)]
# [1] "abcd" "abc1" "ab2b" "a3cd" "4bcd"
2. 2.
test5 <- c('lo', 'lol', 'lolo', 'olo', 'lool')
test5[grep('loll*', test5)]
# [1] "lol" "lolo"
3. 3.
test5
# [1] "lo" "lol" "lolo" "olo" "lool"
test5[grep('lolo+', test5)]
# [1] "lolo"
test5[grep('lol+', test5)]
# [1] "lol" "lolo"
I'm studying basic contents about regular expression using R. But I can't understand why above three examples return those results.我正在研究关于使用 R 的正则表达式的基本内容。但我不明白为什么上述三个示例返回这些结果。
For example, when using ^ in [], I learned that it returns the characters which don't involve the letters behind ^.例如,当在 [] 中使用 ^ 时,我了解到它返回不涉及 ^ 后面字母的字符。 But results don't seem like that.
但结果似乎并非如此。
I'm not good at English so I have difficulty in explaining all kinds of thing that I can't understand but I'll really appreciate if anyone can teach me why those results are returned by those R codes.我不擅长英语,所以我很难解释我无法理解的各种事情,但如果有人能教我为什么那些 R 代码会返回这些结果,我将不胜感激。
:( :(
You are correct that "[^abc1-3]"
will match any character that is not in {a, b, c, 1, 2, 3}.您是正确的,
"[^abc1-3]"
将匹配不在 {a, b, c, 1, 2, 3} 中的任何字符。
grep
will return TRUE if there is any match, and FALSE if there is no match.如果有任何匹配,
grep
将返回 TRUE,如果没有匹配,则返回 FALSE。
test1
# [1] "abcd" "abc1" "ab2b" "a3cd" "4bcd"
test1[grep('[^abc1-3]', test1)]
# [1] "abcd" "a3cd" "4bcd"
The three results have d
, which is not in {a, b, c, 1, 2, 3}
, so d
is matched (the last one has 4
, which is also matched).三个结果都有
d
,不在{a, b, c, 1, 2, 3}
,所以d
匹配(最后一个有4
,也匹配)。 The two test items that are not in the results, "abc1"
and "ab2b"
only have characters in {a, b, c, 1, 2, 3}
, so there is no match.结果中没有的两个测试项,
"abc1"
和"ab2b"
只有{a, b, c, 1, 2, 3}
字符,所以没有匹配。
Regex101 is a good site for testing regular expressions and seeing how they work. Regex101 是测试正则表达式并查看它们如何工作的好网站。 Here is this example: https://regex101.com/r/CaxfCI/1
这是这个例子: https : //regex101.com/r/CaxfCI/1
For your other examples,对于您的其他示例,
*
means 0 or more . *
表示0 或更多。 So 'loll*'
matches lol
followed by 0 or more l
.'loll*'
匹配lol
后跟 0 个或多个l
。+
means 1 or more . +
表示1 个或多个。 So 'lol+'
matches lo
followed by 1 or more l
.'lol+'
匹配lo
后跟 1 个或多个l
。 Note, for grep
, there is not any point in 'loll*'
... the result will be the same as for 'lol'
.请注意,对于
grep
, 'loll*'
没有任何意义......结果将与'lol'
相同。 But in other regex operations, if you are replacing (substituting) or extracting the matches, the difference can matter.但是在其他正则表达式操作中,如果您要替换(替换)或提取匹配项,则差异可能很重要。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.