简体   繁体   English

Python 3.4-从pastebin.com下载新上传的文本文件

[英]Python 3.4 - Downloading newly uploaded text files from pastebin.com

I want to download text files from pastebin.com. 我想从pastebin.com下载文本文件。 Once I start the program it should look for text files that are being uploaded and "download" them once they're uploaded. 启动程序后,它应查找要上传的文本文件,并在上传后“下载”它们。 I know how to "download" them but not how to tell Python to click on one of the public files on http://pastebin.com/archive and then click on the "raw"-button to open a new tab that contains the "raw" content. 我知道如何“下载”它们,但不知道如何告诉Python单击http://pastebin.com/archive上的一个公共文件,然后单击“原始”按钮以打开一个包含“原始”内容。

I googled a lot but literally nothing came up that would help me. 我在Google上搜索了很多,但实际上没有任何帮助。

Thanks 谢谢

Well, a program doesn't know how to "click" anything :). 嗯,程序不知道如何“点击”任何东西:)。 In order to retrieve information from a page, you simply need to send a GET request at the correct url. 为了从页面检索信息,您只需要在正确的URL上发送GET请求即可。 In your case, that would be http://pastebin.com/raw/4ffLHviP or any other code of the pastebin you want to download. 在您的情况下,这可能是http://pastebin.com/raw/4ffLHviP或您要下载的pastebin的任何其他代码。 You can retrieve codes manually, or eg by applying text parsers (regex, beautifulsoup...) on the archive page . 您可以手动检索代码,也可以例如在存档页面上应用文本解析器(正则表达式,beautifulsoup ...)。

Note that, there is an API for scraping Pastebin (see http://pastebin.com/scraping ). 请注意,有一个用于抓取Pastebin的API(请参阅http://pastebin.com/scraping )。 It is strongly recommended, if you want to extract consequent content from them, to use it. 如果要从中提取后续内容,则强烈建议使用它。 It is more "polite", may offer better service, and will avoid you to be blacklisted. 它更“礼貌”,可以提供更好的服务,并且可以避免您被列入黑名单。

To choose a file you simply do the following: 要选择文件,只需执行以下操作:

  1. Visit the link of the file, ex. 访问文件的链接,例如。 http://pastebin.com/B8A6L7Zt http://pastebin.com/B8A6L7Zt
  2. The raw content is already on that page, namely inside <textarea id='paste_code'>...</textarea> . 原始内容已经在该页面上,即<textarea id='paste_code'>...</textarea> So you just cut this content off, using regex for example. 因此,您只需使用正则表达式就可以切断此内容。 在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM