简体   繁体   中英

What's wrong with this regex?

I need to match Twitter-Hashtags within an Android-App, but my code doesn't seem to do what it's supposed to. What I came up with is:

ArrayList<String> tags = new ArrayList<String>(0);
Pattern p = Pattern.compile("\b#[a-z]+", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(tweet); // tweet contains the tweet as a String
while(m.find()){
    tags.add(m.group());
}

The variable tweet contains a regular tweet including hashtags - but find() doesn't trigger. So I guess my regular expression is wrong.

Your regex fails because of the \\b word boundary anchor. This anchor only matches between a non-word character and a word-character (alphanumeric character). So putting it directly in front of the # causes the regex to fail unless there is an alphanumeric character before the # ! Your regex would match a hashtag in foobarfoo#hashtag blahblahblah but not in foobarfoo #hashtag blahblahblah .

Use #\\w+ instead, and remember, inside a string, you need to double the backslashes:

Pattern p = Pattern.compile("#\\w+");

Your pattern should be "#(\\\\w+)" if you are trying to just match the hash tag. Using this and the tweet "retweet pizza to #pizzahut", doing m.group() would give "#pizzahut" and m.group(1) would give "pizzahut".

Edit: Note, the html display is messing with the backslashes for escape, you'll need to have two for the w in your string literal in Java.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM