在 PythonAnywhere 中返回空的美丽汤

Question

我有一个 bs4 应用程序，它会在这种情况下打印关于 igg-games.com 的最新帖子
代码：

from bs4 import BeautifulSoup
import requests

def get_new():
    new = {}
    for i in BeautifulSoup(requests.get('https://igg-games.com/').text, features="html.parser").find_all('article'):
        elem = i.find('a', class_='uk-link-reset')
        new[elem.get_text()] = (elem.get('href'), ", ".join([x.get_text() for x in i.find_all('a', rel = 'category tag')]), i.find('time').get_text())
    return new
current = get_new()
new_item = list(current.items())[0]
print(f"Title: {new_item[0]}\nLink: {new_item[1][0]}\nCatagories: {new_item[1][1]}\nAdded: {new_item[1][2]}")

Output 在我的机器上：

Title: Beholder�s Lair Free Download
Link: https://igg-games.com/beholders-lair-free-download.html
Catagories: Action, Adventure
Added: January 7, 2021

我知道它有效。 但是，我的最终目标是将其变成 rss 提要条目。 所以我将它全部插入到一个高级 PythonAnywhere 容器中。 但是，我的 function get_new() 返回 {}。 有什么我需要做的事情我错过了吗？

Answer 1

在Dmytro O的帮助下解决了。

由于 PythonAnywhere 很可能作为客户端被阻止，因此设置用户代理允许我接收来自预期站点的响应。

#the fix
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}

response = requests.get(url, headers=headers)

当放在我的代码中时

def get_new():
    new = {}
    for i in BeautifulSoup(requests.get('https://igg-games.com/', headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}).text, features="html.parser").find_all('article'):
        elem = i.find('a', class_='uk-link-reset')
        new[elem.get_text()] = (elem.get('href'), ", ".join([x.get_text() for x in i.find_all('a', rel = 'category tag')]), i.find('time').get_text())
    return new

这个方法是通过这个堆栈溢出帖子提供给我的： How to use Python requests to fake a browser visit aka and generate User Agent？

在 PythonAnywhere 中返回空的美丽汤

问题描述

1 个解决方案

解决方案1
2 2021-01-07 18:42:36

在 PythonAnywhere 中返回空的美丽汤

问题描述

1 个解决方案

解决方案1 2 2021-01-07 18:42:36

解决方案1
2 2021-01-07 18:42:36