简体   繁体   English

R中的REGEX模式匹配课程编号

[英]REGEX pattern match in R for Course number

I need to identify matching course number that have xx.3xxxxxx. 我需要确定具有xx.3xxxxxx的匹配课程编号。 These are some examples of the course numbers. 这些是课程编号的一些例子。

26.3730004   
27.0210000    
26.3730009   
26.7114001   
23.9610071  
26.0A34430    
23.3670005    
26.0B05430    

I tried many patterns one example I used is the pattern below. 我尝试了很多模式,我使用的一个例子是下面的模式。 It did not get any match. 它没有得到任何匹配。

"[^0-9]{2}\\Q.\\E3[^0-9]+$" “[^ 0-9] {2} \\ Q. \\ E3 [^ 0-9] + $”

I tried using grep and grepl. 我尝试使用grep和grepl。 I actually need the code to return indexes. 我实际上需要代码来返回索引。

This code shows my attempt to tag the rows that have matches. 此代码显示我尝试标记具有匹配项的行。

Teacher$virtual[
            which(
                 grepl("[^0-9]{2}\\Q.\\E3[^0-9]+$",Teacher$CourseNumber))]
               <- "1"

I need to remove any row from my dataframe that have the course number with that pattern. 我需要从我的数据框中删除具有该模式的课程编号的任何行。 XX.3XXXXXX

But, my code did not find any match. 但是,我的代码没有找到任何匹配。 Can you please help me? 你能帮我么?

Here, this simple expression would likely cover that: 在这里,这个简单的表达可能会涵盖:

^[0-9]{2}\.[3].+$

which has a [3] boundary right after the . 它后面有一个[3]边界. . It would probably work without start and end anchors: 没有开始和结束锚点它可能会工作:

[0-9]{2}\.[3].+

Demo 演示

We can add or reduce the boundaries, if it'd be necessary. 如果有必要,我们可以增加或减少边界。

You should use 你应该用

grepl("^[0-9]{2}\\.3", Teacher$CourseNumber)

See the regex graph : 看到正则表达式图

在此输入图像描述

Details : 细节

  • ^ - start of a string ^ - 字符串的开头
  • [0-9]{2} - two digits [0-9]{2} - 两位数
  • \\\\. - a dot (note that a regex escape is a literal backslash, but inside a string literal, "..." , a single backslash is used to form string escape sequences, hence the backslash must be double to obtain a literal backslash char necessary for a regex escape) - 一个点(请注意,正则表达式转义是字面反斜杠,但在字符串文字中, "..." ,单个反斜杠用于形成字符串转义序列,因此反斜杠必须为double才能获得必要的文字反斜杠字符为正则表达式逃脱)
  • 3 - a 3 char. 3 - 3字符。

NOTE : If you want to use in-pattern quoting with \\Q and \\E (in between which all chars are treated literally) you need to use PCRE regex, add perl=TRUE and use 注意 :如果你想使用\\Q\\E in-pattern引用(在字面上处理所有字符之间),你需要使用PCRE正则表达式,添加perl=TRUE并使用

grepl("^[0-9]{2}\\Q.\\E3", Teacher$CourseNumber, perl=TRUE)

Now, the dot is treated as a literal dot, not a . 现在,点被视为文字点,而不是. metacharacter that matches any char but a line break char (in a PCRE regex, . does not match line break chars by default). 元字符的任何字符,但换行字符匹配(在PCRE正则表达式, .不符合行默认打破字符)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM