简体   繁体   English

美丽的汤不会返回结果

[英]Beautiful Soup won't return results

Here is my code. 这是我的代码。 it doesn't return any errors but it also doesn't return any results. 它不返回任何错误,但也不返回任何结果。

import requests
from bs4 import BeautifulSoup

googtrends = requests.get("https://www.google.com/trends/")
soup = BeautifulSoup(googtrends.content)
links = soup.find_all("a", {"class": "trending-story ng-isolate-scope"})

print links

I haven't solved this yet, I started working on something else instead but I'm going to first try it with selenium and the try using selenium with either phantom js or zombie js and if that still doesn't work I'll use pytrends but I just checked them out and you need a gmail account with, which I have but I would rather try getting it to work without an api first. 我还没有解决这个问题,我开始做其他事情,但是我将首先使用selenium进行尝试,并尝试使用phantom js或zombie js的selenium,如果仍然不起作用,我将使用pytrends,但我刚签出它们,并且您需要一个gmail帐户,但我宁愿尝试先使其在没有api的情况下正常工作。

I will post back here when I get it working 我会在工作时回发到这里

Yes this page is being rendered by JS dynamically- let's have a try even changing the request header( it fails and likewise assures that JS is the cause! ) 是的,此页面由JS动态呈现-让我们尝试一下甚至更改请求标头( 它会失败,并同样确保JS是原因!

Testing code- 测试代码

import requests
from bs4 import BeautifulSoup


my_headers={"Host": "www.google.com",
"User-Agent": "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:40.0) Gecko/20100101 Firefox/40.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"Accept-Language": "en-US,am;q=0.7,zh-HK;q=0.3",
"Accept-Encoding": "gzip, deflate",
"Cookie": "PREF=ID=1111111111111111:FF=0:LD=en:TM=1439993585:LM=1444815129:V=1:S=Zjbb3gK_m_n69Hqv; NID=72=F6UyD0Fr18smDLJe1NzTReJn_5pwZz-PtXM4orYW43oRk2D3vjb0Sy6Bs_Do4J_EjeOulugs_x2P1BZneufegpNxzv7rkY9BPHcfdx9vGOHtJqv2r46UuFI2f5nIZ1Cu4RcT9yS5fZ1SUhel5fHTLbyZWhX-yiPXvZCiQoW4FjZd-3Bwxq8yrpdgmPmf4ufvFNlmTd3y; OGP=-5061451:; OGPC=5061713-3:",
"Connection": "keep-alive"}


googtrends = requests.get("https://www.google.com/trends/",headers=my_headers)
my_content = googtrends.text.encode('utf-8')
soup = BeautifulSoup(my_content,'html.parser')
links = soup.find_all("a", {"class": "trending-story ng-isolate-scope"},href=True)

#Lets try if we are getting correct content from the site
# That site contains "Apple Inc.‬, ‪App Store‬‬" so let's check it in the got response

print 'Apple Inc.‬, ‪App Store‬‬' in my_content

# It prints false so website is being rendered by JS even header change does not affect

So try webdriver like selenium in Firefox, Chrome, PhantomJS etc that executes JS dynamically. 因此,请尝试在Firefox,Chrome,PhantomJS等中使用诸如硒的webdriver来动态执行JS。 Even better try API. 最好尝试使用API​​。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM