简体   繁体   English

从自身文本链接获取PRAW subreddit对象至subreddit

[英]Getting a PRAW subreddit object from self text link to the subreddit

I'm using PRAW with Python and I want to be able to: 我将PRAW与Python结合使用,并且希望能够:

  1. Go through the "new" posts on a subreddit 在subreddit上浏览“新”帖子
  2. Detect if there is a link to a subreddit in the posts selftext 检测帖子自身文本中是否存在指向subreddit的链接
  3. If there is a subreddit linked, get that subreddit as a PRAW object that will be used later. 如果链接了一个子reddit,请将该子reddit作为PRAW对象使用,稍后再使用。

I can do step 1, but finding if there is a subreddit linked and then getting that subreddit is the hard part for me. 我可以执行第1步,但是查找是否有子链接已链接,然后获取该子链接对我来说是困难的部分。 Here's what I've got so far: 到目前为止,这是我得到的:

#! python3
# Reply with subreddit info from subreddit in text body

import praw, time

# Bot login details
USERNAME = "AutoMobBot";
PASSWORD = "<redacted>";

UA = "[Subreddit Info Provider (Update 0) by /u/MatthewMob]";
r = praw.Reddit(UA);
r.login(USERNAME, PASSWORD, disable_warning=True);

submissions = r.get_subreddit("matthewmob_csstesting").get_new(limit=10);

for submission in submissions:
    for word in submission.selftext.lower().split():
        if word.startswith("/r/"):
            print("Found subreddit in:", submission.title);
            print(submission.selftext_html);

print("Done...");
input();

This will just get the submissions, split the words in the selftext, and print out something if one of the split words starts with /r/ , obviously this wouldn't work all the time if the user, for example, only linked the subreddit as r/askreddit or www.reddit.com/r/askreddit . 这将只获取提交内容,在自文本中拆分单词,并在拆分单词之一以/r/开头的情况下打印出一些内容,显然,如果用户(例如)仅链接了subreddit,这将不会一直有效作为r/askredditwww.reddit.com/r/askreddit And even then, if they linked /r/askreddit/top (with something on the end) how would I be able to get that subreddit as a PRAW object? 即使这样,如果他们链接了/r/askreddit/top (末尾有内容),我将如何能够将该subreddit作为PRAW对象? I've been trying to find some kind of regex code to help me do this but have not found it. 我一直在尝试找到某种正则表达式代码来帮助我做到这一点,但没有找到它。

My main question is what is the best way to do get the subreddit from the link in the users selftext, and how do I do that? 我的主要问题是,从用户自身文本中的链接中获取subreddit的最佳方法是什么,我该怎么做?

If you need any more clarification I am happy to provide more information. 如果您需要更多说明,我们很乐意提供更多信息。

I have found my own answer now. 我现在找到了自己的答案。 Here is the code that works for me: 这是对我有用的代码:

#! python3
# Reply with subreddit info from subreddit in text body

import praw, bs4, re
from pprint import pprint

# Bot login details
USERNAME = "AutoMobBot";
PASSWORD = "<Password>";

UA = "[Subreddit Info Provider (Update 4) by /u/MatthewMob]";
r = praw.Reddit(UA);
r.login(USERNAME, PASSWORD, disable_warning=True);

submissions = r.get_subreddit("matthewmob_csstesting").get_new(limit=3);

for submission in submissions:
    subs = [];
    subsfound = -1;
    soup = bs4.BeautifulSoup(submission.selftext_html, "lxml");
    for a in soup.find_all("a", href=True):
        href = a["href"] + "/";
        getsub = re.findall("\/r\/(.*?)\/", href, re.DOTALL);
        if getsub != None:
            if getsub[subsfound] not in subs:
                subs.append(getsub[subsfound]);
                subsfound = subsfound + 1;
                print("\nTitle:", submission.title);
                print("\nSubreddits Found:", subsfound);
                print("\nSubreddit Found:", subs[subsfound] + "\n");

print("Done...");
input();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM