简体   繁体   English

有没有人可以使用 R 解释有关基本正则表达式的结果?

[英]Is there anyone can interpret the results about basic regular expression using R?

1. 1.

test1
# [1] "abcd" "abc1" "ab2b" "a3cd" "4bcd"    
test1[grep('[^abc1-3]', test1)]
# [1] "abcd" "a3cd" "4bcd"
test1[grep('[^a-d1-3]', test1)]
# [1] "4bcd"
test1[grep('[^4]', test1)]
# [1] "abcd" "abc1" "ab2b" "a3cd" "4bcd"

2. 2.

test5 <- c('lo', 'lol', 'lolo', 'olo', 'lool')
test5[grep('loll*', test5)]
# [1] "lol"  "lolo"

3. 3.

test5
# [1] "lo"   "lol"  "lolo" "olo"  "lool"
test5[grep('lolo+', test5)]
# [1] "lolo"
test5[grep('lol+', test5)]
# [1] "lol"  "lolo"

I'm studying basic contents about regular expression using R. But I can't understand why above three examples return those results.我正在研究关于使用 R 的正则表达式的基本内容。但我不明白为什么上述三个示例返回这些结果。

For example, when using ^ in [], I learned that it returns the characters which don't involve the letters behind ^.例如,当在 [] 中使用 ^ 时,我了解到它返回不涉及 ^ 后面字母的字符。 But results don't seem like that.但结果似乎并非如此。

I'm not good at English so I have difficulty in explaining all kinds of thing that I can't understand but I'll really appreciate if anyone can teach me why those results are returned by those R codes.我不擅长英语,所以我很难解释我无法理解的各种事情,但如果有人能教我为什么那些 R 代码会返回这些结果,我将不胜感激。

:( :(

You are correct that "[^abc1-3]" will match any character that is not in {a, b, c, 1, 2, 3}.您是正确的, "[^abc1-3]"将匹配不在 {a, b, c, 1, 2, 3} 中的任何字符。

grep will return TRUE if there is any match, and FALSE if there is no match.如果有任何匹配, grep将返回 TRUE,如果没有匹配,则返回 FALSE。

test1
# [1] "abcd" "abc1" "ab2b" "a3cd" "4bcd"    
test1[grep('[^abc1-3]', test1)]
# [1] "abcd" "a3cd" "4bcd"

The three results have d , which is not in {a, b, c, 1, 2, 3} , so d is matched (the last one has 4 , which is also matched).三个结果都有d ,不在{a, b, c, 1, 2, 3} ,所以d匹配(最后一个有4 ,也匹配)。 The two test items that are not in the results, "abc1" and "ab2b" only have characters in {a, b, c, 1, 2, 3} , so there is no match.结果中没有的两个测试项, "abc1""ab2b"只有{a, b, c, 1, 2, 3}字符,所以没有匹配。

Regex101 is a good site for testing regular expressions and seeing how they work. Regex101 是测试正则表达式并查看它们如何工作的好网站。 Here is this example: https://regex101.com/r/CaxfCI/1这是这个例子: https : //regex101.com/r/CaxfCI/1

For your other examples,对于您的其他示例,

  • * means 0 or more . *表示0 或更多 So 'loll*' matches lol followed by 0 or more l .所以'loll*'匹配lol后跟 0 个或多个l
  • + means 1 or more . +表示1 个或多个 So 'lol+' matches lo followed by 1 or more l .所以'lol+'匹配lo后跟 1 个或多个l

Note, for grep , there is not any point in 'loll*' ... the result will be the same as for 'lol' .请注意,对于grep'loll*'没有任何意义......结果将与'lol'相同。 But in other regex operations, if you are replacing (substituting) or extracting the matches, the difference can matter.但是在其他正则表达式操作中,如果您要替换(替换)或提取匹配项,则差异可能很重要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM