简体   繁体   中英

Python post requests - web scraping

So I am trying to access some data for web scraping. However I got stuck when reaching a point of extracting a graph from this site where i want to edit the data-timeperiod observed in the code below. Is there any way of extracting or changing this snippet from having data-timeperiod="today" active into data-timeperiod="week"?

For some extra information I have tried accessing the network tab in chrome to change this through a post request but each time I get denied access.

<div class="fLeft">
    <ul class="chartsTimeperiod cleanList floatList clearFix buttonPane">
        <li class="active">
                <a href="#" data-timeperiod="today" class="active default">
                    1 d.</a>
            </li>
        <li class="">
                <a href="#" data-timeperiod="week" class="">
                    1 v.</a>
            </li>
        <li class="">
                <a href="#" data-timeperiod="month" class="">
                    1 mån.</a>
            </li>
        <li class="">
                <a href="#" data-timeperiod="three_months" class="">
                    3 mån.</a>
            </li>
        <li class="">
                <a href="#" data-timeperiod="this_year" class="">
                    i år</a>
            </li>
        <li class="">
                <a href="#" data-timeperiod="year" class="">
                    1 år</a>
            </li>
        <li class="last">
                <a href="#" data-timeperiod="three_years" class="">
                    3 år</a>
            </li>
        </ul>
</div>

I can see through the Network tab that there is a request payload containing following data. Is this something I should use in order to access the data or am I on the wrong track?

{"orderbookId":842107,"chartType":"AREA","widthOfPlotContainer":558,"chartResolution":"MINUTE","navigator":true,"percentage":false,"volume":false,"owners":false,"timePeriod":"week","ta":[],"compareIds":[19002]}

Question 2 - Example: Based on this

<form method="get" class="forumPagerForm">
        <label for="pageSizeSelect" class="fLeft marginTop5px">Visa antal inlägg:</label> 
        <select id="pageSizeSelect" class="pageSizeSelect">
            <option >15</option>
            <option >25</option>
            <option >50</option>
            <option >75</option>
            <option >100</option>
            <option selected="selected">200</option>
        </select>

        
    </form>

Try:

import requests

janson = {
    "orderbookId": '842107',
    "chartType": "AREA",
    "widthOfPlotContainer": '558',
    "chartResolution": "MINUTE",
    "navigator": 'true',
    "percentage": 'false',
    "volume": 'false',
    "owners": 'false',
    "timePeriod": "week",
    "ta": [],
    "compareIds": ['19002']
}
s = requests.Session()
s.get('https://www.avanza.se/aktier/om-aktien.html/842107/gabather')
p = s.post('https://www.avanza.se/ab/component/highstockchart/getchart/orderbook', json=janson)
print(p)

And after that scrape from variable p

You want to get the points from the graph, yes? If you change the graph resolution - from let's say "week" to "month" - and look at the network traffic logger, you can see that the browser makes an HTTP POST request to https://www.avanza.se/ab/component/highstockchart/getchart/orderbook .

Simply imitate that request. Here, the graph resolution is set to "week" , but you should be able to change it to "month" , etc. Then I make the request and print the first ten points:

def main():

    import requests

    url = "https://www.avanza.se/ab/component/highstockchart/getchart/orderbook"

    data = {
        "chartResolution": "MINUTE",
        "chartType": "AREA",
        "compareIds": [19002],
        "navigator": True,
        "orderbookId": 842107,
        "owners": False,
        "percentage": False,
        "ta": [],
        "timePeriod": "week",
        "volume": False,
        "widthOfPlotContainer": 558
    }

    response = requests.post(url, json=data)
    response.raise_for_status()

    data = response.json()

    for y, x in data["dataPoints"][0:10]:
        print(x, y)

    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

Output:

None 1594103400000
8.36 1594105200000
8.4 1594107000000
8.26 1594108800000
8.3 1594110600000
8.42 1594112400000
8.54 1594114200000
8.5 1594116000000
8.52 1594117800000
8.6 1594119600000
>>> 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM