Python web scraping with requests - after login

Question

I have a python requests/beatiful soup code below which enables me to login to a url successfully. However, after logon, to get the data I need would normally have to manually have to:

1) click on 'statement' in the first row:

2) Select dates, click 'run statement':

3) view data:

This is the code that I have used to logon to get to step 1 above:

import requests
from bs4 import BeautifulSoup

logurl = "https://login.flash.co.za/apex/f?p=pwfone:login"
posturl = 'https://login.flash.co.za/apex/wwv_flow.accept'

with requests.Session() as s:
    s.headers = {"User-Agent":"Mozilla/5.0"}
    res = s.get(logurl)
    soup = BeautifulSoup(res.text,"html.parser")

    arg_names =[]
    for name in  soup.select("[name='p_arg_names']"):
        arg_names.append(name['value'])

    values = {
        'p_flow_id': soup.select_one("[name='p_flow_id']")['value'],
        'p_flow_step_id': soup.select_one("[name='p_flow_step_id']")['value'],
        'p_instance': soup.select_one("[name='p_instance']")['value'],
        'p_page_submission_id': soup.select_one("[name='p_page_submission_id']")['value'],
        'p_request': 'LOGIN',
        'p_t01': 'solar',
        'p_arg_names': arg_names,
        'p_t02': 'password',
        'p_md5_checksum': soup.select_one("[name='p_md5_checksum']")['value'],
        'p_page_checksum': soup.select_one("[name='p_page_checksum']")['value']
    }
    s.headers.update({'Referer': logurl})
    r = s.post(posturl, data=values)
    print (r.content)

My question is, (beginner speaking), how could I skip steps 1 and 2 and simply do another headers update and post using the final URL using selected dates as form entries (headers and form info below)? (The referral header is step 2 above):

]

Edit 1: network request from csv file download:

Answer 1

使用Selenium WebDriver，它具有很多很好的功能来处理Web服务。

Answer 2

Selenium is gonna be your best bet for automated browser interactions. It can be used not only to scrape data from websites but also to interact with different forms and such. I highly recommended it as I have used it quite a bit in the past. If you already have pip and python installed go ahead and type

pip install selenium

That will install selenium but you also need to install either geckodriver (for Firefox) or chromedriver (for chrome) Then you should be up and running!

Answer 3

As others have recommended, Selenium is a good tool for this sort of task. However, I'd try to suggest a way to use requests for this purpose as that's what you asked for in the question.

The success of this approach would really depend on how the webpage is built and how data files are made available (if "Save as CSV" in the view data is what you're targeting).

If the login mechanism is cookie-based, you can use Sessions and Cookies in requests. When you submit a login form, a cookie is returned in the response headers. You add the cookie to request headers in any subsequent page requests to make your login stick.

Also, you should inspect the network request for "Save as CSV" action in the Developer Tools network pane. If you can see a structure to the request, you may be able to make a direct request within your authenticated session, and use a statement identifier and dates as the payload to get your results.

Python web scraping with requests - after login

Question

3 answers

solution1
0 2018-06-18 15:15:54

solution2
0 2018-06-18 15:33:55

solution3
0 2018-06-18 20:08:27

Python web scraping with requests - after login

Question

3 answers

solution1 0 2018-06-18 15:15:54

solution2 0 2018-06-18 15:33:55

solution3 0 2018-06-18 20:08:27

solution1
0 2018-06-18 15:15:54

solution2
0 2018-06-18 15:33:55

solution3
0 2018-06-18 20:08:27