有没有人可以使用 R 解释有关基本正则表达式的结果？

Question

1. 1.

test1
# [1] "abcd" "abc1" "ab2b" "a3cd" "4bcd"    
test1[grep('[^abc1-3]', test1)]
# [1] "abcd" "a3cd" "4bcd"
test1[grep('[^a-d1-3]', test1)]
# [1] "4bcd"
test1[grep('[^4]', test1)]
# [1] "abcd" "abc1" "ab2b" "a3cd" "4bcd"

2. 2.

test5 <- c('lo', 'lol', 'lolo', 'olo', 'lool')
test5[grep('loll*', test5)]
# [1] "lol"  "lolo"

3. 3.

test5
# [1] "lo"   "lol"  "lolo" "olo"  "lool"
test5[grep('lolo+', test5)]
# [1] "lolo"
test5[grep('lol+', test5)]
# [1] "lol"  "lolo"

I'm studying basic contents about regular expression using R. But I can't understand why above three examples return those results.我正在研究关于使用 R 的正则表达式的基本内容。但我不明白为什么上述三个示例返回这些结果。

For example, when using ^ in [], I learned that it returns the characters which don't involve the letters behind ^.例如，当在 [] 中使用 ^ 时，我了解到它返回不涉及 ^ 后面字母的字符。 But results don't seem like that.但结果似乎并非如此。

I'm not good at English so I have difficulty in explaining all kinds of thing that I can't understand but I'll really appreciate if anyone can teach me why those results are returned by those R codes.我不擅长英语，所以我很难解释我无法理解的各种事情，但如果有人能教我为什么那些 R 代码会返回这些结果，我将不胜感激。

:( :(

Answer 1

You are correct that "[^abc1-3]" will match any character that is not in {a, b, c, 1, 2, 3}.您是正确的， "[^abc1-3]"将匹配不在 {a, b, c, 1, 2, 3} 中的任何字符。

grep will return TRUE if there is any match, and FALSE if there is no match.如果有任何匹配， grep将返回 TRUE，如果没有匹配，则返回 FALSE。

test1
# [1] "abcd" "abc1" "ab2b" "a3cd" "4bcd"    
test1[grep('[^abc1-3]', test1)]
# [1] "abcd" "a3cd" "4bcd"

The three results have d , which is not in {a, b, c, 1, 2, 3} , so d is matched (the last one has 4 , which is also matched).三个结果都有d ，不在{a, b, c, 1, 2, 3} ，所以d匹配（最后一个有4 ，也匹配）。 The two test items that are not in the results, "abc1" and "ab2b" only have characters in {a, b, c, 1, 2, 3} , so there is no match.结果中没有的两个测试项， "abc1"和"ab2b"只有{a, b, c, 1, 2, 3}字符，所以没有匹配。

Regex101 is a good site for testing regular expressions and seeing how they work. Regex101 是测试正则表达式并查看它们如何工作的好网站。 Here is this example: https://regex101.com/r/CaxfCI/1这是这个例子： https : //regex101.com/r/CaxfCI/1

For your other examples,对于您的其他示例，

* means 0 or more . *表示0 或更多。 So 'loll*' matches lol followed by 0 or more l .所以'loll*'匹配lol后跟 0 个或多个l 。
+ means 1 or more . +表示1 个或多个。 So 'lol+' matches lo followed by 1 or more l .所以'lol+'匹配lo后跟 1 个或多个l 。

Note, for grep , there is not any point in 'loll*' ... the result will be the same as for 'lol' .请注意，对于grep ， 'loll*'没有任何意义......结果将与'lol'相同。 But in other regex operations, if you are replacing (substituting) or extracting the matches, the difference can matter.但是在其他正则表达式操作中，如果您要替换（替换）或提取匹配项，则差异可能很重要。

有没有人可以使用 R 解释有关基本正则表达式的结果？

问题描述

1 个解决方案

解决方案1
0 已采纳 2019-12-04 17:17:34

有没有人可以使用 R 解释有关基本正则表达式的结果？

问题描述

1 个解决方案

解决方案1 0 已采纳 2019-12-04 17:17:34

解决方案1
0 已采纳 2019-12-04 17:17:34