简体   繁体   English

Java String.matches正则表达式

[英]Java String.matches regex

I am trying to see if a given host name appears in a list of hosts in the form of comma separated string like the following: 我试图查看给定的主机名是否以逗号分隔的字符串形式出现在主机列表中,如下所示:

String list = "aa.com,bb.com,cc.com,dd.net,ee.com,ff.net";
String host1 = "aa.com"; // should be a match
String host2 = "a.com";  // shouldn't be a match
String host3 = "ff.net"  // should be a match

// here is a test for host1     
if (list.matches(".*[,^]" + host1 + "[$,].*")) {
    System.out.println(host1 + " matched");
}
else {
    System.out.println(host1 + " not matched");
}

But I got not matched for host (aa.com) but then I am not very familiar with regex. 但是我没有匹配主机(aa.com),但是我对正则表达式不是很熟悉。 Please correct me! 请纠正我!

BTW I don't want to use a solution where you split the host list into an array and then doing matching there. 顺便说一句,我不想​​使用将主机列表拆分为数组然后在其中进行匹配的解决方案。 It was too slow because the host list can be quite long. 太慢了,因为主机列表可能很长。 Regex apporoach can be even worse but I was trying to make it work first. 正则表达式的方法可能更糟,但是我试图使其首先起作用。

I also think Regexes are too slow if you are looking for an exact match, so I tried to write a method that looks for occurences of the host name in the list and checks every substring whether it's not a part of a wider host name (like "a.com" is a part of "aa.com"). 我还认为如果您要查找完全匹配的正则表达式太慢,因此我尝试编写一种方法来查找列表中主机名的出现,并检查每个子字符串是否不属于更广泛的主机名(例如“ a.com”是“ aa.com”的一部分)。 If it's not - the result is true, there is such a host in the list. 如果不是-结果为true,则列表中有这样的主机。 Here's the code: 这是代码:

boolean containsHost(String list, String host) {
    boolean result = false;
    int i = -1;
    while((i = list.indexOf(host, i + 1)) >= 0) { // while there is next match
        if ((i == 0 || list.charAt(i - 1) == ',') // beginning of the list or has a comma right before it
                && (i == (list.length() - host.length()) // end of the list 
                || list.charAt(i + host.length()) == ',')) { // or has a comma right after it
            result = true;
            break;
        }
    }
    return result;
}

But then I thought that it would be even faster to check just 3 cases - matches in the beginning, in the middle and in the end of the list, which can be done with startsWith , contains and endsWith methods respectively. 但是后来我认为,仅检查3种情况(在列表的开头,中间和结尾匹配)会更快,可以分别使用startsWithcontainsendsWith方法进行endsWith Here's the second option, which I would prefer in your case: 这是第二种选择,在您的情况下,我希望使用它:

boolean containsHostShort(String list, String host) {
    return list.contains("," + host + ",") || list.startsWith(host + ",") || list.endsWith("," + host);     
}

UPD: ZouZou's comment to your post also seems good, I would recommend to compare the speed on a list similar to the sizes you have in the real situation and choose the fastest one. UPD:ZouZou对您的帖子的评论似乎也不错,我建议您将列表中的速度与实际情况下的大小相比较,然后选择最快的速度。

This works prefectly,without regex 完全有效,没有正则表达式

         String list = "aa.com,bb.com,cc.com,dd.net,ee.com,ff.net";
         String host1 = "aa.com"; 
         String host2 = "a.com";  
         String host3 = "ff.net"; 
         boolean checkingFlag=false;
         String [] arrayList=list.split(",");
        System.out.println(arrayList.length);




        for(int i=0;i<arrayList.length;i++)
        {
          // here is a test for host1     
            if (arrayList[i].equalsIgnoreCase(host1))
                checkingFlag=true;

        }

        if (checkingFlag)
            System.out.println("Matched");
        else
            System.out.println("Not matched");

It is hardly taken 20-30 millsecs to execute a loop with 1 million records.As per your comment i have just edited.you can check this. 几乎不需要20到30毫秒来执行具有100万条记录的循环。根据我刚刚编辑的注释,您可以检查一下。

long startingTime=System.currentTimeMillis();

        for(int i=0;i<1000000;i++)
        {
            if (i==999999)
                checkingFlag=true;

        }
        long endingTime=System.currentTimeMillis();
        System.out.println("total time in millisecond:"+ (endingTime-startingTime));

Like it is mentioned in the comments. 就像评论中提到的那样。 You shouldn't be using Matches as it tries to match the regex pattern to the entire comma delimited string. 您不应该使用Matches因为它会尝试将正则表达式模式与整个逗号分隔的字符串进行匹配。 You are not trying to do that. 您不是要这样做。 You are trying to detect if a given substring occurs in the comma separated source string. 您试图检测给定的子字符串是否出现在逗号分隔的源字符串中。

In order to do that you would just use the hostname in a findall method. 为此,您只需在findall方法中使用主机名。 However, you can just use substring which would not have an overhead of regex compilation. 但是,您可以只使用不会引起正则表达式编译开销的子字符串。

Regexes are used to match strings that could have variations in the pattern matched. 正则表达式用于匹配可能在匹配模式中有所变化的字符串。 Never use a regex when you want to do exact string matching. 当您想进行精确的字符串匹配时,切勿使用正则表达式。

You can use a lambda to stream the array and return a boolean for the match. 您可以使用lambda来流式处理数组并为匹配返回一个boolean

String list = "aa.com,bb.com,cc.com,dd.net,ee.com,ff.net";
String host1 = "aa.com"; // should be a match
String host2 = "a.com";  // shouldn't be a match
String host3 = "ff.net";  // should be a match

ArrayList<String> alist = new ArrayList<String>();

for(String item : list.split("\\,"))
{
    alist.add(item);
}

boolean contains_host1 = alist.stream().anyMatch(b -> b.equals(host1));
boolean contains_host2 = alist.stream().anyMatch(b -> b.equals(host2));
boolean contains_host3 = alist.stream().anyMatch(b -> b.equals(host3));

System.out.println(contains_host1);
System.out.println(contains_host2);
System.out.println(contains_host3);

Console output: 控制台输出:

true
false
true

Try this: 尝试这个:

String list = "aa.com,bb.com,cc.com,dd.net,ee.com,ff.net";
String host1 = "aa.com"; // should be a match
String host2 = "a.com";  // shouldn't be a match
String host3 = "ff.net"  // should be a match 

//For host1
Pattern p1 = Pattern.compile("\\b[A-Za-z]{2}.com");  
Matcher m1 = p1.matcher(list);

if(m1.find()){
   System.out.println(host1 + " matched");
}else{
   System.out.println(host1 + " not matched");
}

//for host2
p1 = Pattern.compile("\\b[A-Za-z]{1}.com");
m1 = p1.matcher(list);

if(m1.find()){
     System.out.println(host2 + " matched");
}else{
     System.out.println(host2+"Not mached");
}

//and so on...

The \\b means word boundary (so start of word in this case). \\ b表示单词边界(在这种情况下为单词的开头)。 The [A-Za-z]{n}.com means a character between AZ or az n times followed by a .com [A-Za-z] {n} .com表示AZ或az n次之间的字符,后跟一个.com

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM