为什么这个正则表达式返回一个空列表？

Question

New programmer here.. I am trying to get all of the hashtags and links from a string.新程序员在这里.. 我正在尝试从字符串中获取所有主题标签和链接。 The regular expressions return the desired result when on their own;正则表达式单独返回所需的结果； however, an empty list is returned when they are combined.但是，当它们组合在一起时，会返回一个空列表。 How can one fix this?如何解决这个问题？

import re

tweet = ('New PyBites article: Module of the Week - Requests-cache '
     'for Repeated API Calls - http://pybit.es/requests-cache.html '
     '#python #APIs')


# Get all hashtags and links from tweet
def get_hashtags_and_links(tweet=tweet):
tweet_regex = re.compile(r'''(
                         \(#\w+\)
                         \(https://[^\s]+\)
                         )''', re.VERBOSE)

tweet_object = tweet_regex.findall(tweet)
print(tweet_object)

get_hashtags_and_links()

Answer 1

you are looking for #\w+ (enclosed in literal parenthesis) immediately followed by https://[^\s]+ (also enclosed in literal parenthesis) which appears no where in your text您正在寻找#\w+ （括在文字括号中）紧随其后的是https://[^\s]+ （也包含在文字括号中），它在您的文本中没有出现

instead use the |而是使用| or bar或酒吧

re.compile(r'''(
            \(#\w+\)|
            \(https://[^\s]+\)
                     )''', re.VERBOSE)

but as pointed out \( is looking for an actual parenthesis (it is not grouping)但正如所指出的\(正在寻找一个实际的括号（它不是分组）

so you probably just want所以你可能只是想要

"(#\w+)|(https?://[^\s]+)"

you can use non-capturing groups( (?:...) ) if you want as well如果您愿意，也可以使用非捕获组（ (?:...) ）

"((?:#\w+)|(?:https?://[^\s]+))"

Answer 2

You can use the regex as follows:您可以按如下方式使用正则表达式：

    http_hash_search = re.compile(r"(\w+:\/\/\S+)|(#[A-Za-z0-9]+)")

#[A-Za-z0-9]+ --- This will search for #hashtag followed by any number or letters #[A-Za-z0-9]+ --- 这将搜索#hashtag，后跟任何数字或字母

(\w+://\S+) --- This will search for paths on the tweets (\w+://\S+) --- 这将搜索推文上的路径

Answer 3

Whatever you wanted to search for with your regex, you need to make sure you escape # char that is special in case you compile the regex with re.X / re.VERBOSE flag .无论你想用你的正则表达式搜索什么，你都需要确保你转义# char 这是特殊的，以防你用re.X / re.VERBOSE flag编译正则表达式。 This option enables comments inside the regex pattern that start with an unescaped hash symbol and go on till the line end.此选项启用正则表达式模式中的注释，这些注释以非转义的 hash 符号和 go 开头，直到行尾。

When a line contains a # that is not in a character class and is not preceded by an unescaped backslash, all characters from the leftmost such # through the end of the line are ignored.当一行包含一个不在字符 class 中的#并且前面没有未转义的反斜杠时，从最左边的这种#到行尾的所有字符都将被忽略。

So, assuming you want to match either hashtags or specific URLs you may use因此，假设您想匹配您可能使用的主题标签或特定 URL

tweet_regex = re.compile(r'''
                     \#\w+             # Hashtag pattern
                     |                 # or
                     https?://\S+      # URLs
                     ''', re.VERBOSE)

See the Python code demo , output:参见Python 代码演示output：

['http://pybit.es/requests-cache.html', '#python', '#APIs']

为什么这个正则表达式返回一个空列表？

问题描述

3 个解决方案

解决方案1
2 2020-08-05 19:21:56

解决方案2
0 2020-08-05 19:30:50

解决方案3
0 2020-08-05 21:35:11

为什么这个正则表达式返回一个空列表？

问题描述

3 个解决方案

解决方案1 2 2020-08-05 19:21:56

解决方案2 0 2020-08-05 19:30:50

解决方案3 0 2020-08-05 21:35:11

解决方案1
2 2020-08-05 19:21:56

解决方案2
0 2020-08-05 19:30:50

解决方案3
0 2020-08-05 21:35:11