行首的可选匹配

Question

I am trying to create a regular expression in Python that matches #hashtags. 我正在尝试在Python中创建一个与#hashtags匹配的正则表达式。 My definition on a hashtag is: 我对主题标签的定义是：

It is a work that starts with a # 这是一个以#开头的作品
It can contain all characters except [ ,\\.] 它可以包含除[ ,\\.]之外的所有字符[ ,\\.]
It can be anywhere in the text 它可以在文本中的任何位置

So in this text 所以在本文中

#This string cont#ains #four, and #only four #hashtags.

The hashes here are This , four , only and hashtags . 这里的哈希是This ， four ， only和hashtags 。

The problem I have is the optional check for the beginning of line. 我的问题是行首的可选检查。

[ \\.,]+ won't do it since it won't match the optional beginning. [ \\.,]+不会执行此操作，因为它与可选的开头不匹配。
[ \\.,]? won't do it since it matches too much. 因为它匹配太多，所以不会这样做。

Example with + +示例

In []: re.findall('[ \.,]+#([^ \.,]+)', '#This string cont#ains #four, and #only four #hashtags.')
Out[]: ['four', 'only', 'hashtags']

Example with ? 以？为例

In []: re.findall('[ \.,]?#([^ \.,]+)', '#This string cont#ains #four, and #only four #hashtags.')
Out[]: ['This', 'ains', 'four', 'only', 'hashtags']

How can optional match the beginning of the line? 可选内容如何匹配行首？

Answer 1

This seems to work: 这似乎可行：

>>> re.findall(r'\B#([^,\W]+)', '#This string cont#ains #four, and #only four #hashtags.')
['This', 'four', 'only', 'hashtags']

\\B : Matches the empty string, but only when it is not at the beginning or end of a word. \\B ：匹配空字符串，但仅当它不在单词的开头或结尾时才匹配。 This means that r'py\\B' matches 'python' , 'py3' , 'py2' , but not 'py' , 'py.' 这意味着r'py\\B'匹配'python' ， 'py3' ， 'py2' ，但不匹配'py' ， 'py.' , or 'py!' 或'py!' . 。 \\B is just the opposite of \\b , so is also subject to the settings of LOCALE and UNICODE . \\B与\\b相反，因此也受LOCALE和UNICODE的设置的限制。
\\W : When the LOCALE and UNICODE flags are not specified, matches any non-alphanumeric character; \\W ：未指定LOCALE和UNICODE标志时，匹配任何非字母数字字符；否则，不匹配。 this is equivalent to the set [^a-zA-Z0-9_] . 这等效于集合[^a-zA-Z0-9_] 。 With LOCALE, it will match any character not in the set [0-9_] , and not defined as alphanumeric for the current locale. 使用LOCALE，它将匹配不在集合[0-9_]且未定义为当前语言环境的字母数字的任何字符。 If UNICODE is set, this will match anything other than [0-9_] plus characters classied as not alphanumeric in the Unicode character properties database. 如果设置了UNICODE ，则它将匹配[0-9_]以及Unicode字符属性数据库中归类为非字母数字字符之外的任何字符。

Answer 2

Before your regex you can just tell what you don't want. 在使用正则表达式之前，您只需说出不需要的内容即可。

(?<!\w)(#[^ \.,]+)

With negative lookbehind you can do that 有了负面的眼神，你可以做到这一点

行首的可选匹配

问题描述

2 个解决方案

解决方案1
3 已采纳 2012-09-26 20:54:37

解决方案2
0 2012-09-26 20:57:17

行首的可选匹配

问题描述

2 个解决方案

解决方案1 3 已采纳 2012-09-26 20:54:37

解决方案2 0 2012-09-26 20:57:17

解决方案1
3 已采纳 2012-09-26 20:54:37

解决方案2
0 2012-09-26 20:57:17