簡體   English   中英

Python 發布請求 - web 抓取

[英]Python post requests - web scraping

所以我正在嘗試訪問 web 抓取的一些數據。 但是,當我要從該站點提取圖表時,我想編輯下面代碼中觀察到的數據時間段時遇到了困難。 有什么方法可以提取或更改此代碼段從激活 data-timeperiod="today" 到 data-timeperiod="week"?

對於一些額外的信息,我嘗試訪問 chrome 中的網絡選項卡以通過發布請求更改此設置,但每次我都被拒絕訪問。

<div class="fLeft">
    <ul class="chartsTimeperiod cleanList floatList clearFix buttonPane">
        <li class="active">
                <a href="#" data-timeperiod="today" class="active default">
                    1 d.</a>
            </li>
        <li class="">
                <a href="#" data-timeperiod="week" class="">
                    1 v.</a>
            </li>
        <li class="">
                <a href="#" data-timeperiod="month" class="">
                    1 mån.</a>
            </li>
        <li class="">
                <a href="#" data-timeperiod="three_months" class="">
                    3 mån.</a>
            </li>
        <li class="">
                <a href="#" data-timeperiod="this_year" class="">
                    i år</a>
            </li>
        <li class="">
                <a href="#" data-timeperiod="year" class="">
                    1 år</a>
            </li>
        <li class="last">
                <a href="#" data-timeperiod="three_years" class="">
                    3 år</a>
            </li>
        </ul>
</div>

我可以通過 Network 選項卡看到有一個包含以下數據的請求有效負載。 這是我應該用來訪問數據的東西還是我走錯了路?

{"orderbookId":842107,"chartType":"AREA","widthOfPlotContainer":558,"chartResolution":"MINUTE","navigator":true,"percentage":false,"volume":false,"owners":false,"timePeriod":"week","ta":[],"compareIds":[19002]}

問題 2 - 示例:基於

<form method="get" class="forumPagerForm">
        <label for="pageSizeSelect" class="fLeft marginTop5px">Visa antal inlägg:</label> 
        <select id="pageSizeSelect" class="pageSizeSelect">
            <option >15</option>
            <option >25</option>
            <option >50</option>
            <option >75</option>
            <option >100</option>
            <option selected="selected">200</option>
        </select>

        
    </form>

嘗試:

import requests

janson = {
    "orderbookId": '842107',
    "chartType": "AREA",
    "widthOfPlotContainer": '558',
    "chartResolution": "MINUTE",
    "navigator": 'true',
    "percentage": 'false',
    "volume": 'false',
    "owners": 'false',
    "timePeriod": "week",
    "ta": [],
    "compareIds": ['19002']
}
s = requests.Session()
s.get('https://www.avanza.se/aktier/om-aktien.html/842107/gabather')
p = s.post('https://www.avanza.se/ab/component/highstockchart/getchart/orderbook', json=janson)
print(p)

然后從變量p中抓取

你想從圖表中得到點,是嗎? 如果您將圖形分辨率從“周”更改為“月”,然后查看網絡流量記錄器,您可以看到瀏覽器向https://www.avanza.se/ab/component/highstockchart/getchart/orderbook發出 HTTP POST 請求https://www.avanza.se/ab/component/highstockchart/getchart/orderbook

簡單地模仿那個請求。 在這里,圖形分辨率設置為"week" ,但您應該可以將其更改為"month"等。然后我提出請求並打印前十點:

def main():

    import requests

    url = "https://www.avanza.se/ab/component/highstockchart/getchart/orderbook"

    data = {
        "chartResolution": "MINUTE",
        "chartType": "AREA",
        "compareIds": [19002],
        "navigator": True,
        "orderbookId": 842107,
        "owners": False,
        "percentage": False,
        "ta": [],
        "timePeriod": "week",
        "volume": False,
        "widthOfPlotContainer": 558
    }

    response = requests.post(url, json=data)
    response.raise_for_status()

    data = response.json()

    for y, x in data["dataPoints"][0:10]:
        print(x, y)

    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

Output:

None 1594103400000
8.36 1594105200000
8.4 1594107000000
8.26 1594108800000
8.3 1594110600000
8.42 1594112400000
8.54 1594114200000
8.5 1594116000000
8.52 1594117800000
8.6 1594119600000
>>> 

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM