簡體   English   中英

爬取.aspx頁,未填充發布請求結果

[英]Scraping .aspx page, post request results are not populating

我正在嘗試抓取網頁,但我只想要特定儲備銀行(紐約)的結果。 我已經對抓取.aspx頁面進行了一些研究,並且我相信自己在發帖請求中捕獲了所有必需的變量,但是我仍然沒有。

我在檢閱元素中添加了請求正文中的各種元素。 我一直沒有得到任何結果,好像頁面上的搜索功能從未執行過。

我可以刮掉不可搜索的頁面( https://www.federalreserve.gov/apps/h2a/h2a.aspx )沒問題,我的結果如下所示:

Applicant: Alberto Joseph Safra, David Joseph Safra and Esther Safra Dayan, Sao Palo, Brazil and Jacob Joseph Safra, Geneva, Switzerland;, Activity: to acquire voting shares of SNBNY Holdings Limited, Gibraltar, Gibraltar and thereby indirectly acquire Safra National Bank of New York, New York, New York., Law: CIBC, Reserve Bank: St. Louis, End of Comment Period: 04/16/2019 

Applicant: American National Bankshares, Inc.,, Activity: to acquire HomeTown Bankshares Corporation, and thereby indirectly acquire HomeTown Bank, both in Roanoke, Virginia ... engage in mortgage lending, also applied to acquire at least 49 percent of HomeTown Residential Mortgage, LLC, Virginia Beach, VA., Law: 3, Reserve Bank: Richmond, End of Comment Period: 02/28/2019 

Applicant: Ameris Bancorp, Moultrie, Georgia;, Activity: to merge with Fidelity Southern Corporation, and thereby indirectly acquire Fidelity Bank, both of Atlanta, Georgia., Law: 3, Reserve Bank: Atlanta, End of Comment Period: 03/14/2019 

Applicant: Amsterdam Bancshares, Inc., Amsterdam, Missouri;, Activity: to acquire 100 percent of the voting shares of S.T.D. Investments, Inc., and thereby indirectly acquire Bank of Minden, both of Mindenmines, Missouri., Law: 3, Reserve Bank: Kansas City, End of Comment Period: 01/04/2019 

Applicant: Amy Beth Windle Oakley, Cookeville, Tennessee, and Mark Edward Copeland, Ooltewah, Tennessee;, Activity: to become members of the Windle/Copeland Family Control Group and thereby retain shares of Overton Financial Services, Inc., and its subsidiary, Union Bank and Trust Company, both of Livingston, Tennessee., Law: CIBC, Reserve Bank: Atlanta, End of Comment Period: 12/27/2018 

Applicant: Anderson W. Chandler Trust A Indenture dated July 25, 1996, and Cathleen Chandler Stevenson, individually, and as trustee, both of Dallas, Texas; Activity: to retain voting shares of Fidelity as members of the Anderson W. Chandler Family Control Group., Law: CIBC, Reserve Bank: Kansas City, End of Comment Period: 06/20/2019 

Applicant: Arthur Haag Sherman, the Sherman 2018 Irrevocable Trust, Sherman Tectonic FLP LP, and Sherman Family Holdings LLC, all of Houston, Texas;, Activity: as a group acting in concert, to acquire shares of T Acquisition, Inc., and thereby indirectly acquire T Bank, National Association, both of Dallas, Texas., Law: CIBC, Reserve Bank: Dallas, End of Comment Period: 12/10/2018 

Applicant: BancFirst Corporation, Oklahoma City, Oklahoma;, Activity: to acquire voting shares of Pegasus Bank, Dallas, Texas., Law: 3, Reserve Bank: Kansas City, End of Comment Period: 06/07/2019 

Applicant: BankFirst Capital Corporation, Macon, Mississippi;, Activity: to merge with FNB Bancshares of Central Alabama, Inc., and thereby indirectly acquire FNB of Central Alabama, both in Aliceville, Alabama., Law: 3, Reserve Bank: St. Louis, End of Comment Period: 02/28/2019 

看到我只想要紐約聯邦儲備銀行的結果,我想抓取可搜索的URL( https://www.federalreserve.gov/apps/h2a/h2asearch.aspx )。 除了紐約以外,我還和其他多家銀行打過交道,它們都不使用我的代碼產生結果。 在網頁上搜索時,會有紐約的結果。 這就是使我相信我的帖子請求有問題的原因。 這是我的代碼:

import requests
from bs4 import BeautifulSoup



headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}

print('Scraping the Latest H2A Release...')

url1 = 'https://www.federalreserve.gov/apps/h2a/h2asearch.aspx'
r1 = requests.get(url=url1, headers=headers)
soup1 = BeautifulSoup(r1.text,'html.parser')
viewstate = soup1.findAll("input", {"type": "hidden", "name": "__VIEWSTATE"})
eventvalidation = soup1.findAll("input", {"type": "hidden", "name": "__EVENTVALIDATION"})
stategenerator = soup1.findAll("input", {"type": "hidden", "name": "__VIEWSTATEGENERATOR"})


item_request_body = {
"__ASYNCPOST": "true",
"__EVENTARGUMENT": "",
"__EVENTTARGET": "",
"__EVENTVALIDATION": eventvalidation[0]['value'],
"__VIEWSTATE": viewstate[0]['value'],
"__VIEWSTATEGENERATOR": stategenerator[0]['value'],
"ctl00%24bodyMaster%24applicantTextBox":" ",
"ctl00%24bodyMaster%24districtDropDownList": "2",
"ctl00%24bodyMaster%24ScriptManager1": "ctl00%24bodyMaster%24mainUpdatePanel%7Cctl00%24bodyMaster%24searchButton",
"ctl00%24bodyMaster%24searchButton": "Search",
"ctl00%24bodyMaster%24sectionDropDownBox": "ALL",
"ctl00%24bodyMaster%24targetTextBox": ""
}

url = 'https://www.federalreserve.gov/apps/h2a/h2asearch.aspx'
r2 = requests.post(url=url, data=item_request_body, cookies=r1.cookies, headers=headers)
soup = BeautifulSoup(r2.text, 'html.parser')

mylist5 = []
for tr in soup.find_all('tr')[2:]:
    tds = tr.find_all('td')
    output5 = ("Applicant: %s, Activity: %s, Law: %s, Reserve Bank: %s, End of Comment Period: %s \r\n" % (tds[0].text, tds[1].text, tds[2].text, tds[3].text, tds[4].text))
    mylist5.append(output5)
    print(mylist5)

希望以下腳本可以讓您解析選擇New York生成的內容。 嘗試這個。

import requests
from bs4 import BeautifulSoup

url = 'https://www.federalreserve.gov/apps/h2a/h2asearch.aspx'

with requests.Session() as s:
    r = s.get(url)
    soup = BeautifulSoup(r.text,'lxml')
    payload = {item['name']:item.get('value','') for item in soup.select('input[name]')}
    payload['ctl00$bodyMaster$sectionDropDownBox'] = 'ALL'
    payload['ctl00$bodyMaster$districtDropDownList'] = '2'
    del payload['ctl00$bodyMaster$clearButton']
    res = s.post(url,data=payload)
    sauce = BeautifulSoup(res.text,'lxml')
    for items in sauce.select("table.pubtables tr"):
        data = [item.get_text(strip=True) for item in items.select("th,td")]
        print(data)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM