[英]What's wrong with this regex?
I need to match Twitter-Hashtags within an Android-App, but my code doesn't seem to do what it's supposed to. 我需要在Android应用程序中匹配Twitter标签,但是我的代码似乎没有执行应有的功能。 What I came up with is:
我想出的是:
ArrayList<String> tags = new ArrayList<String>(0);
Pattern p = Pattern.compile("\b#[a-z]+", Pattern.CASE_INSENSITIVE);
Matcher m = p.matcher(tweet); // tweet contains the tweet as a String
while(m.find()){
tags.add(m.group());
}
The variable tweet contains a regular tweet including hashtags - but find() doesn't trigger. 变量tweet包含包含标签的常规tweet-但是find()不会触发。 So I guess my regular expression is wrong.
所以我想我的正则表达式是错误的。
Your regex fails because of the \\b
word boundary anchor. 您的正则表达式由于
\\b
字边界锚而失败。 This anchor only matches between a non-word character and a word-character (alphanumeric character). 该锚仅在非单词字符和单词字符(字母数字字符)之间匹配。 So putting it directly in front of the
#
causes the regex to fail unless there is an alphanumeric character before the #
! 因此,将其直接放在
#
前面会导致正则表达式失败,除非#
之前没有字母数字字符! Your regex would match a hashtag in foobarfoo#hashtag blahblahblah
but not in foobarfoo #hashtag blahblahblah
. 您的正则表达式将与
foobarfoo#hashtag blahblahblah
中的foobarfoo#hashtag blahblahblah
匹配,但不与foobarfoo #hashtag blahblahblah
中的foobarfoo#hashtag blahblahblah
匹配。
Use #\\w+
instead, and remember, inside a string, you need to double the backslashes: 请改用
#\\w+
,并记住在字符串中,您需要将反斜杠加倍:
Pattern p = Pattern.compile("#\\w+");
Your pattern should be "#(\\\\w+)" if you are trying to just match the hash tag. 如果您尝试仅匹配哈希标记,则您的模式应为“#(\\\\ w +)”。 Using this and the tweet "retweet pizza to #pizzahut", doing m.group() would give "#pizzahut" and m.group(1) would give "pizzahut".
使用此消息和推文“将比萨饼转发到#pizzahut”,执行m.group()将给出“ #pizzahut”,而m.group(1)将给出“ pizzahut”。
Edit: Note, the html display is messing with the backslashes for escape, you'll need to have two for the w in your string literal in Java. 编辑:请注意,html显示正将反斜杠弄乱,以进行转义,在Java的字符串文字中,您需要为w设置两个。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.