简体   繁体   English

在Java中使用Regex进行搜索

[英]Search using Regex in Java

I have an array of strings (or ArrayList) something like: 我有一个字符串数组(或ArrayList),例如:

strMain = "S1R2G3M1D1N3";

The strMain consists of several alphabet followed by digits as suffix. strMain由几个字母和后缀数字组成。

Also I have a string something like: 我也有一个像这样的字符串:

str1 = "S1,,--R2,,,,D3-N3";

I need to see if each S1, R2, D3 and N3 in str1 are part of the array of the string strMain . 我需要查看str1中的每个S1,R2,D3和N3是否是字符串strMain的数组的一部分。 I could not figure out how to do this. 我不知道该怎么做。 I guess I need to split str1 such that I get only "letters followed by the digit" into an array. 我想我需要将str1拆分为仅将“字母后跟数字”分成一个数组。 Then I could check the presence of these strings in strMain . 然后,我可以检查strMain中这些字符串的存在。 Can anyone suggest the regex in-order to split? 谁能建议正则表达式按顺序拆分? Is there any other way we could check the presence without splitting (instead use regex to search for the presence) 还有其他方法可以检查存在状态而不拆分吗(而是使用正则表达式搜索存在状态)

Can you tell me the regex for splitting this? 你能告诉我正则表达式拆分吗?

This regex could work: [AZ][0-9] 此正则表达式可以工作: [AZ][0-9]

Example code: 示例代码:

String strMain = "S1R2G3M1D1N3";
String str = "S1,,--R2,,,,D3-N3";
Pattern pattern = Pattern.compile( "[A-Z][0-9]" );
Matcher matcher = pattern.matcher( str );
while ( matcher.find() ) {
    if ( strMain.contains( matcher.group() ) ) {
        System.out.println( matcher.group() );
    }
}

gives this output 给这个输出

S1
R2
N3

EDIT 编辑

In response to your comment... 回应您的评论...

Sometimes digit may not be present. 有时数字可能不存在。 What is the expression? 表达是什么? Ex: str="S,,--R2,,,,-N3" shall print "SR2N3". 例如:str =“ S ,,-R2 ,,,-N3”应打印“ SR2N3”。 Also sometimes I may have to include single dot or double dots or single quotes or two single quotes Ex: str="S.,,--R2..,,,D3-N3',N3''" shall print S., R2.., N3', N3'' . 另外有时我可能必须包括单点或双点或单引号或两个单引号Ex:str =“ S。,,-R2 .. ,, D3-N3',N3''”应打印S., R2 ..,N3',N3''。 Here only alphabet is must and digit, single dot, two dots, single quote or two single quotes are all optional. 这里只有字母是必须的,并且数字,单点,两个点,单引号或两个单引号都是可选的。

String strMain = "S1R2G3M1D1N3";
String str = "S.,,--R2...o,,,D3-N3',N3''";
Pattern pattern = Pattern.compile( "([A-Z][0-9]?)(?:\\.{1,2}|'{1,2})?" );
Matcher matcher = pattern.matcher( str );
while ( matcher.find() ) {
    if ( strMain.contains( matcher.group( 1 ) ) ) {
        System.out.println( matcher.group( 0 ) );
    }
}

gives this output: 给出以下输出:

S.
R2..
N3'
N3''

[AZ] is one capital letter. [AZ]是一个大写字母。
[0-9] is one number. [0-9]是一个数字。
X? is X, one or zero times. 是X,一倍或零倍。 so then... 那...
[0-9]? is one number, one or zero times. 是一个数字,一或零倍。

Parenthesis create a capturing group, meaning we can later grab what was matched between the parenthesis... 括号创建一个捕获组,这意味着我们以后可以抓取括号之间匹配的内容...

([AZ][0-9]?) is going to capture one capital letter and the optional one number. ([AZ][0-9]?)将捕获一个大写字母和可选的一个数字。

Then to match the dots and single quotes... 然后匹配点和单引号...

X{Y,Z} means match X, between Y and Z times, so... X{Y,Z}表示在X和Y之间匹配X,因此...
X{1,2} means match X, between 1 and 2 times. X{1,2}表示匹配X,介于1到2次之间。
X|Y means to match either X or Y. I surround this in parenthesis, otherwise the whole expression will be OR'ed. X|Y表示匹配X或Y。我将其括在括号中,否则整个表达式将为OR'ed。
\\\\. means to match a period. 意味着匹配一个时期。 You can't just use . 您不能只使用. because that has a special meaning, which is any one character. 因为它具有特殊的含义,可以是任何一个字符。 Therefore you must escape it with \\ , which itself also has to be escaped for the java compiler by using another one. 因此,您必须使用\\对其进行转义,对于Java编译器,还必须使用\\对其进行转义。
(\\\\.{1,2}|'{1,2}) means to match one or two periods, OR one or two single quotes, and capture the group. (\\\\.{1,2}|'{1,2})表示匹配一个或两个句点,或匹配一个或两个单引号,并捕获该组。
(?:X) means to not capture the group - I don't care about capturing this group, so putting everything together... (?:X)表示不捕获该组-我不在乎捕获此组,因此将所有内容放在一起...
(?:\\\\.{1,2}|'{1,2})? - match one or two periods, OR one or two single quotes, and do this whole match either one or zero times. -匹配一个或两个句点,或匹配一个或两个单引号,然后整个匹配一次或零次。

Then later you can call matcher.group(...) to get captured groups, starting at 1. 0 means the entire match. 然后,您可以调用matcher.group(...)获取捕获的组,从1开始。0表示整个匹配。 So then the group(1) call gives me just the alphanumeric part, which I use for checking if it exists. 因此, group(1)调用只给了我字母数字部分,我用它来检查它是否存在。

Take a look here at the Javadoc: http://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html 在这里查看Javadoc: http : //docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM