简体   繁体   中英

Java Find word in a String

I need to find a word in a HTML source code. Also I need to count occurrence. I am trying to use regular expression. But it says 0 match found.

I am using regular expression as I thought its the best way. In case of any better way, please let me know.

I need to find the occurrence of the word "hsw.ads" in HTML source code.

I have taken following steps.

int count = 0;
{
    Pattern p = Pattern.compile(".*(hsw.ads).*");
    Matcher m = p.matcher(SourceCode);
    while(m.find())count++;
}

But the count is 0;

Please let me know your solutions.

Thank you. Help Seeker

You are not matching any "expression", so probably a simple string search would be better. commons-lang has StringUtils.countMatches(source, "yourword") .

If you don't want to include commons-lang, you can write that manually. Simply use source.indexOf("yourword", x) multiple times, each time supplying a greater value of x (which is the offset), until it gets -1

You should try this.

private int getWordCount(String word,String source){
        int count = 0;
        {
            Pattern p = Pattern.compile(word);
            Matcher m = p.matcher(source);
            while(m.find()) count++;
        }
        return count;
    }

Pass the word (Not pattern) you want to search in a string.

To find a string in Java you can use String methods indexOf which tells you the index of the first character of the string you searched for. To find all of them and count them you can do this (there might be a faster way but this should work). I would recommend using StringUtils CountMatches method.

String temp = string; //Copy to save the string
int count = 0;
String a = "hsw.ads";
int i = 0;

while(temp.indexOf(a, i) != -1) {
    count++;
    i = temp.indexof(a, i) + a.length() + 1;
}

StringUtils.countMatches(SourceCode, "hsw.ads") ought to work, however sticking with the approach you have above (which is valid), I'd recommend a few things: 1. As John Haager mentioned, remove the opening/closing .* will help, becuase you're looking for that exact substring 2. You want to escape the '.' because you're searching for a literal '.' and not a wildcard 3. I would make this Pattern a constant and re-use it rather than re-creating it each time.

That said, I'd still suggest using the approaches above, but I thought I'd just point out your current approach isn't conceptually flawed; just a few implementation details missing.

Your code and regular expression is valid. You don't need to include the .* at the beginning and the end of your regex. For example:

String t = "hsw.ads hsw.ads hsw.ads";
int count = 0;
Matcher m  = Pattern.compile("hsw\\.ads").matcher(t);
while (m.find()){ count++; }

In this case, count is 3. And another thing, if you're going to use a regex, if you REALLY want to specifically look for a '.' period between hsw and ads, you need to escape it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM