使用正則表達式匹配Python中的單詞

Question

我正在使用PRAW制作一個reddit機器人，該機器人將接受說“ alot”的人的評論作者並將其用戶名存儲到列表中。 我在正則表達式以及如何使字符串正常工作方面遇到麻煩。 這是我的代碼。

#importing praw for reddit api and time to make intervals

import praw
import time
import re


username = "LewisTheRobot"
password = 



r = praw.Reddit(user_agent = "Counts people who say alot")

word_to_match = ['\balot\b']

storage = []

r.login(username, password)

def run_bot():
    subreddit = r.get_subreddit("test")
    print("Grabbing subreddit")
    comments = subreddit.get_comments(limit=200)
    print("Grabbing comments")
    for comment in comments:
        comment_text = comment.body.lower()
        isMatch = any(string in comment_text for string in word_to_match)
        if comment.id not in storage and isMatch:
            print("Match found! Storing username: " + str(comment.author) + " into list.")
            storage.append(comment.author)


    print("There are currently: " + str(len(storage)) + " people who use 'alot' instead of ' a lot'.")


while True:
    run_bot()
    time.sleep(5)

所以我使用的正則表達式會查找單詞alot而不是字符串中的alot。 例子很多。 每當我運行此命令時，它都不會找到我的評論。 有什么建議么？

Answer 1

您正在檢查字符串操作， 而不是 RE操作

isMatch = any(string in comment_text for string in word_to_match)

第一次in這里檢查的子串-無關的RE。

更改為

isMatch = any(re.search(string, comment_text) for string in word_to_match)

此外，您在初始化時出錯：

word_to_match = ['\balot\b']

'\\b'是代碼為0x08 （退格）的字符。 始終對RE模式使用原始字符串語法，以避免此類陷阱：

word_to_match = [r'\balot\b']

現在，您將擁有幾個字符，反斜杠然后是b ，RE將解釋為“單詞邊界”。

可能還有其他錯誤，但我嘗試不為每個問題尋找兩個以上的錯誤... :-)

使用正則表達式匹配Python中的單詞

問題描述

1 個解決方案

解決方案1
3 已采納 2015-01-22 17:42:44

使用正則表達式匹配Python中的單詞

問題描述

1 個解決方案

解決方案1 3 已采納 2015-01-22 17:42:44

解決方案1
3 已采納 2015-01-22 17:42:44