简体   繁体   English

Python请求提供与Internet Explorer不同的页面文本

[英]Python Requests gives different page text than Internet Explorer

Looking at my stackoverflow user profile page: https://stackoverflow.com/users/2683104/roberto 查看我的stackoverflow用户个人资料页面: https : //stackoverflow.com/users/2683104/roberto

The site indicates I have been a member for 316 days (screenshots at end of post). 该网站表明我已经成为会员316天(帖子结尾处的屏幕截图)。 If I view source in my browser (IE11), I can see this data comes from a days-visited class. 如果我在浏览器(IE11)中view source ,则可以看到此数据来自某天days-visited课程。

But if I look for this same days-visited information using Python Requests, the data does not appear anywhere. 但是,如果我使用Python请求查找同days-visited信息,则数据不会出现在任何地方。 Why? 为什么?

from requests import Session
from BeautifulSoup import BeautifulSoup

s = Session()

url = 'https://stackoverflow.com/users/2683104/roberto'
page = s.get(url)
soup = BeautifulSoup(page.text)
print soup.prettify() #server response, prettified

# following returns error
# AttributeError: 'NoneType' object has no attribute 'getText'
#days_visited = soup.find('span', attrs={'id':'days-visited'}).getText()

s.close()

screenshot 屏幕截图

屏幕截图

view source 查看源 view_source

python Requests python请求 python_requests

That field is not visible to your script (or other users). 该字段对您的脚本(或其他用户)不可见。 If you want to scrap that piece of information, you will need to have your script login and store appropriate cookies. 如果您要删除该信息,则需要登录脚本并存储适当的cookie。

This is what is seen by users that aren't you: 这是不是您的用户看到的: 轮廓

And the code block they see: 他们看到的代码块是:

 <tbody>
            <tr>
                <th>visits</th>
                <td>member for</td>

                <td class="cool" title="2013-08-14 15:38:01Z">11 months</td>
            </tr>
            <tr>
                <th></th>
                <td>seen</td>

                <td class="supernova" title="2014-08-08 05:26:50Z">
                    <span title="2014-08-08 05:26:50Z" class="relativetime">6 mins ago</span>
                </td>
            </tr>
        </tbody>

Normally, I'd recommend against scraping Stack Overflow for data and use the API instead, but this particular piece of information isn't returned as part of the User object. 通常,我建议不要对Stack Overflow抓取数据,而应使用API ,但不要将这条特定信息作为User对象的一部分返回。

As the comments said, 'days-visited' only shows when you are logged-in. 正如评论所说,“访问天数”仅在您登录后显示。 And it can be seen only by the member himself. 它只能由成员本人看到。

You may find you cookies in your browser and use cookies in you request. 您可能会在浏览器中找到cookie,并在请求中使用cookie。

http://docs.python-requests.org/en/latest/user/quickstart/#cookies http://docs.python-requests.org/zh_CN/latest/user/quickstart/#cookies

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM