简体   繁体   English

urllib发出多个POST请求

[英]urllib making multiple POST requests

I have a website that I'm trying to pull data from, but to get to the data I need to go through 2 pages: a login screen and a second screen where I select the data to be read. 我有一个网站,我正在尝试从中提取数据,但是要获取数据,我需要浏览两页:登录屏幕和第二个屏幕,我在其中选择要读取的数据。 My code looks like this: 我的代码如下所示:

    import urllib
    from bs4 import BeautifulSoup

    url = 'http://website.com'
    values = {'userName' : 'tom',
              'Login' : 'submit'}
    data = urllib.parse.urlencode(values).encode('ascii')
    req = urllib.request.Request(url, data)
    page = urllib.request.urlopen(req)
    soup = BeautifulSoup(page,'html.parser')
    print(soup.text)

My question is how I would submit a second POST request after the login request in order to get to the data that im looking for? 我的问题是,我将如何在登录请求之后提交第二个POST请求,以便获得即时通讯所需的数据?

Generally, it always depends on how they authenticate the user and how they store this session, php, token-based, google authentication. 通常,它始终取决于他们如何对用户进行身份验证以及如何存储此会话,基于令牌的基于php的google身份验证。 Without knowing all of this information its hard to know. 不知道所有这些信息就很难知道。 A common way of getting around this bloat is by using a headless web browser. 解决这种膨胀的常见方法是使用无头Web浏览器。 A browser that can be controlled via code. 可以通过代码控制的浏览器。 Allowing you to click on the page as you would do normally! 允许您像平常一样单击页面!

I recommend seleniumhq for python! 我建议将seleniumhq用于python! http://www.seleniumhq.org/ http://www.seleniumhq.org/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM