简体   繁体   English

使用 Praw 抓取 subreddit 帖子标题并将其用作文件名

[英]Scrape subreddit post titles and use them as filename using Praw

My code currently downloads images from a given subreddit and will name them as the original file name.我的代码当前从给定的 subreddit 下载图像,并将它们命名为原始文件名。 What I would like the code to do is to name them as what they are posted on Reddit.我希望代码做的是将它们命名为它们在 Reddit 上发布的名称。 Would anyone be able to help me out please?请问有人能帮我吗? I think it's something to do with Submission.title but I can't figure it out.我认为这与 Submission.title 有关,但我无法弄清楚。 cheers.干杯。

import praw
import threading
from requests import get
from multiprocessing.pool import ThreadPool
import os


client_id = 'xxxxxxxxx'
client_secret = 'xxxxxxxxx'
user_agent = 'xxxxxxxxx'
image_directory = 'images'
thread_count = 16

target_subreddit = 'space'
image_count = '10'
order = 'hot'

order = order.lower()

reddit = praw.Reddit(client_id=client_id,
                     client_secret=client_secret, user_agent=user_agent)


def get_order():
    if order == 'hot':
        ready = reddit.subreddit(target_subreddit).hot(limit=None)
    elif order == 'top':
        ready = reddit.subreddit(target_subreddit).top(limit=None)
    elif order == 'new':
        ready = reddit.subreddit(target_subreddit).new(limit=None)
    return ready


def get_img(what):
    image = '{}/{}/{}'.format(image_directory,
                              target_subreddit, what.split('/')[-1])
    img = get(what).content
    with open(image, 'wb') as f:
        f.write(img)


def make_dir():
    directory = f'{image_directory}/{target_subreddit}'
    if not os.path.exists(directory):
        os.makedirs(directory)


def main():
    c = 1
    images = []
    make_dir()
    for submission in get_order():
        url = submission.url
        if url.endswith(('.jpg', '.png', '.gif', '.jpeg')):
            images.append(url)
            c += 1
            if int(image_count) < c:
                break

    results = ThreadPool(thread_count).imap_unordered(get_img, images)
    for path in results:
        pass

    print('Done')

if __name__ == '__main__':
    main()

Yeah, so if your 'url' variable is giving you a correct url, then simply submission.title should give you the title.是的,所以如果你的 'url' 变量给你一个正确的 url,那么只需 submit.title 应该给你标题。 You may be getting tripped up with the encoding, so you may want to convert it with str(), or get a bit fancier with the encode function.您可能会被编码绊倒,因此您可能希望使用 str() 对其进行转换,或者使用 encode 函数变得更有趣。 Also, some characters are not allowed in many file names, so perhaps try stripping unallowable characters from the title.此外,许多文件名中不允许使用某些字符,因此可以尝试从标题中删除不允许的字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM