简体   繁体   中英

Python web scraping with requests - after login

I have a python requests/beatiful soup code below which enables me to login to a url successfully. However, after logon, to get the data I need would normally have to manually have to:

1) click on 'statement' in the first row:

在此处输入图片说明

2) Select dates, click 'run statement':

在此处输入图片说明

3) view data:

在此处输入图片说明

This is the code that I have used to logon to get to step 1 above:

import requests
from bs4 import BeautifulSoup

logurl = "https://login.flash.co.za/apex/f?p=pwfone:login"
posturl = 'https://login.flash.co.za/apex/wwv_flow.accept'

with requests.Session() as s:
    s.headers = {"User-Agent":"Mozilla/5.0"}
    res = s.get(logurl)
    soup = BeautifulSoup(res.text,"html.parser")

    arg_names =[]
    for name in  soup.select("[name='p_arg_names']"):
        arg_names.append(name['value'])

    values = {
        'p_flow_id': soup.select_one("[name='p_flow_id']")['value'],
        'p_flow_step_id': soup.select_one("[name='p_flow_step_id']")['value'],
        'p_instance': soup.select_one("[name='p_instance']")['value'],
        'p_page_submission_id': soup.select_one("[name='p_page_submission_id']")['value'],
        'p_request': 'LOGIN',
        'p_t01': 'solar',
        'p_arg_names': arg_names,
        'p_t02': 'password',
        'p_md5_checksum': soup.select_one("[name='p_md5_checksum']")['value'],
        'p_page_checksum': soup.select_one("[name='p_page_checksum']")['value']
    }
    s.headers.update({'Referer': logurl})
    r = s.post(posturl, data=values)
    print (r.content)

My question is, (beginner speaking), how could I skip steps 1 and 2 and simply do another headers update and post using the final URL using selected dates as form entries (headers and form info below)? (The referral header is step 2 above):

在此处输入图片说明 ]

Edit 1: network request from csv file download:

在此处输入图片说明

使用Selenium WebDriver,它具有很多很好的功能来处理Web服务。

Selenium is gonna be your best bet for automated browser interactions. It can be used not only to scrape data from websites but also to interact with different forms and such. I highly recommended it as I have used it quite a bit in the past. If you already have pip and python installed go ahead and type

pip install selenium

That will install selenium but you also need to install either geckodriver (for Firefox) or chromedriver (for chrome) Then you should be up and running!

As others have recommended, Selenium is a good tool for this sort of task. However, I'd try to suggest a way to use requests for this purpose as that's what you asked for in the question.

The success of this approach would really depend on how the webpage is built and how data files are made available (if "Save as CSV" in the view data is what you're targeting).

If the login mechanism is cookie-based, you can use Sessions and Cookies in requests. When you submit a login form, a cookie is returned in the response headers. You add the cookie to request headers in any subsequent page requests to make your login stick.

Also, you should inspect the network request for "Save as CSV" action in the Developer Tools network pane. If you can see a structure to the request, you may be able to make a direct request within your authenticated session, and use a statement identifier and dates as the payload to get your results.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM