简体   繁体   English

这个正则表达式有什么问题?

[英]What's wrong with this regex?

I need to match Twitter-Hashtags within an Android-App, but my code doesn't seem to do what it's supposed to. 我需要在Android应用程序中匹配Twitter标签,但是我的代码似乎没有执行应有的功能。 What I came up with is: 我想出的是:

ArrayList<String> tags = new ArrayList<String>(0);
Pattern p = Pattern.compile("\b#[a-z]+", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(tweet); // tweet contains the tweet as a String
while(m.find()){
    tags.add(m.group());
}

The variable tweet contains a regular tweet including hashtags - but find() doesn't trigger. 变量tweet包含包含标签的常规tweet-但是find()不会触发。 So I guess my regular expression is wrong. 所以我想我的正则表达式是错误的。

Your regex fails because of the \\b word boundary anchor. 您的正则表达式由于\\b字边界锚而失败。 This anchor only matches between a non-word character and a word-character (alphanumeric character). 该锚仅在非单词字符和单词字符(字母数字字符)之间匹配。 So putting it directly in front of the # causes the regex to fail unless there is an alphanumeric character before the # ! 因此,将其直接放在#前面会导致正则表达式失败,除非# 之前没有字母数字字符! Your regex would match a hashtag in foobarfoo#hashtag blahblahblah but not in foobarfoo #hashtag blahblahblah . 您的正则表达式将与foobarfoo#hashtag blahblahblah中的foobarfoo#hashtag blahblahblah匹配,但不与foobarfoo #hashtag blahblahblah中的foobarfoo#hashtag blahblahblah匹配。

Use #\\w+ instead, and remember, inside a string, you need to double the backslashes: 请改用#\\w+ ,并记住在字符串中,您需要将反斜杠加倍:

Pattern p = Pattern.compile("#\\w+");

Your pattern should be "#(\\\\w+)" if you are trying to just match the hash tag. 如果您尝试仅匹配哈希标记,则您的模式应为“#(\\\\ w +)”。 Using this and the tweet "retweet pizza to #pizzahut", doing m.group() would give "#pizzahut" and m.group(1) would give "pizzahut". 使用此消息和推文“将比萨饼转发到#pizzahut”,执行m.group()将给出“ #pizzahut”,而m.group(1)将给出“ pizzahut”。

Edit: Note, the html display is messing with the backslashes for escape, you'll need to have two for the w in your string literal in Java. 编辑:请注意,html显示正将反斜杠弄乱,以进行转义,在Java的字符串文字中,您需要为w设置两个。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM