简体   繁体   English

如何抓取使用 iframe 的网站?

[英]How do I Webscrape a website that uses iframes?

I am trying to scrape this website 'https://swimming.org.nz/results.html'.我正在尝试抓取此网站“https://swimming.org.nz/results.html”。 In the form that comes up, I am filling in only the Age column as 8 to 8. I am using the following code to scrape the table as suggested elsewhere in StackOverflow.在出现的表格中,我仅将 Age 列填写为 8 到 8。我正在使用以下代码按照 StackOverflow 中其他地方的建议来抓取表格。 I am unable to get the table.我拿不到桌子。 How to get all the tables for this age group 8 to 8.如何获得这个年龄段 8 到 8 岁的所有表格。

import requests
from bs4 import BeautifulSoup


s = requests.Session()
r = s.get("https://swimming.org.nz/results.html")
soup = BeautifulSoup(r.content, "html.parser")
iframe_src = soup.select_one("x-MS_FIELD_AGE.FROM.L").attrs["src"]
r = s.get(f"https:{iframe_src}")
soup = BeautifulSoup(r.content, "html.parser")
for row in soup.select("x-form-text x-form-field"):
    print("\t".join([e.text for e in row.select("th, td")]))

You will see that it is not necessary using BeautifulSoup if you' ve look at the developer tools on your browser.如果您查看浏览器上的开发人员工具,您会发现没有必要使用 BeautifulSoup。 Sending request is below and response type is xml.发送请求如下,响应类型为 xml。 You don't need any scraping tool.您不需要任何抓取工具。 You can get all of that data changing the StartRowIndex and MaximumRowCount .您可以获得更改StartRowIndexMaximumRowCount的所有数据。

import requests

url = "https://connect.swimming.org.nz/snz-wrap-public/pages/pageRequestHandler?tunnelTarget=tableData%2F%3F&data_file=MS.COMP.RESULTS&dict_file=MS.COMP.RESULTS&doGet=true"

payload="StartRowIndex=0&MaximumRowCount=100&sort=BY-DSND%20COMP.DATE%20BY-DSND%20STAGE&dir=ASC&tid=extTable1620108707767_4352538&selectCriteria=GET-LIST%20CMS_TABLE_19483_65507_076_184811_&extraColumns=%3CColumns%20DynamicLinkRoot%3D%22https%3A%2F%2Fconnect.swimming.org.nz%3A443%2Fsnz-wrap-public%2Fworkflows%2F%22%3E%3CColumn%3E%3CColumnName%3EExpander%3C%2FColumnName%3E%3CField%3EFRAGMENT_DISPLAY.SPLITS%3C%2FField%3E%3CShowInExpander%3Etrue%3C%2FShowInExpander%3E%3C%2FColumn%3E%3CColumn%3E%3CFieldExpression%3E%7BMEMBER.FORE1%7D%20%7BMEMBER.SURNAME%7D%3C%2FFieldExpression%3E%3CField%3EEXPRESSION_FIELD_1%3C%2FField%3E%3CColumnName%3EName%2520%3C%2FColumnName%3E%3CWidth%3E130%3C%2FWidth%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3EXGENDER%3C%2FField%3E%3CColumnName%3EGender%3C%2FColumnName%3E%3CWidth%3E50%3C%2FWidth%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3EENTRANT.AGE%3C%2FField%3E%3CColumnName%3EAge%3C%2FColumnName%3E%3CWidth%3E35%3C%2FWidth%3E%3C%2FColumn%3E%3CColumn%3E%3CFieldExpression%3E%7BXCATEGORY2%7D%3C%2FFieldExpression%3E%3CField%3ECATEGORY2.NUM%24%24SNZ%3C%2FField%3E%3CColumnName%3EDistance%3C%2FColumnName%3E%3CWidth%3E70%3C%2FWidth%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3EXCATEGORY1%3C%2FField%3E%3CColumnName%3EStroke%3C%2FColumnName%3E%3CWidth%3E70%3C%2FWidth%3E%3C%2FColumn%3E%3CColumn%3E%3CFieldExpression%3E%7BTIME%24%24SNZ%7D%3C%2FFieldExpression%3E%3CField%3ERESULT.TIME.MILLISECONDS%3C%2FField%3E%3CColumnName%3ETime%2520%3C%2FColumnName%3E%3CWidth%3E70%3C%2FWidth%3E%3CAlign%3Eright%3C%2FAlign%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3EFINA.POINTS%24%24SNZ%3C%2FField%3E%3CColumnName%3EFINA%2520Points%3C%2FColumnName%3E%3CWidth%3E85%3C%2FWidth%3E%3CAlign%3Eright%3C%2FAlign%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3EFINA.YEAR%24%24SNZ%3C%2FField%3E%3CColumnName%3EPoints%2520Year%3C%2FColumnName%3E%3CWidth%3E80%3C%2FWidth%3E%3CAlign%3Eright%3C%2FAlign%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3E%24DATE%24COMP.DATE%3C%2FField%3E%3CColumnName%3EDate%3C%2FColumnName%3E%3CWidth%3E70%3C%2FWidth%3E%3CAlign%3Eright%3C%2FAlign%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3EXEVENT.CODE%3C%2FField%3E%3CColumnName%3EMeet%3C%2FColumnName%3E%3CWidth%3E190%3C%2FWidth%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3EPARAMETER1%3C%2FField%3E%3CColumnName%3ECourse%3C%2FColumnName%3E%3CWidth%3E50%3C%2FWidth%3E%3C%2FColumn%3E%3C%2FColumns%3E&extraColumnsDownload=%3CDownloadColumns%20DynamicLinkRoot%3D%22https%3A%2F%2Fconnect.swimming.org.nz%3A443%2Fsnz-wrap-public%2Fworkflows%2F%22%3E%3CColumn%3E%3CField%3EXGENDER%3C%2FField%3E%3CColumnName%3EGender%3C%2FColumnName%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3EENTRANT.AGE%3C%2FField%3E%3CColumnName%3EAge%3C%2FColumnName%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3EXCATEGORY2%3C%2FField%3E%3CColumnName%3EDistance%3C%2FColumnName%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3EXCATEGORY1%3C%2FField%3E%3CColumnName%3EStroke%3C%2FColumnName%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3ETIME%24%24SNZ%3C%2FField%3E%3CColumnName%3ETime%2520%3C%2FColumnName%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3EFINA.POINTS%24%24SNZ%3C%2FField%3E%3CColumnName%3EFINA%2520Points%3C%2FColumnName%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3EFINA.YEAR%24%24SNZ%3C%2FField%3E%3CColumnName%3EPoints%2520Year%3C%2FColumnName%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3E%24DATE%24COMP.DATE%3C%2FField%3E%3CColumnName%3EDate%3C%2FColumnName%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3EXEVENT.CODE%3C%2FField%3E%3CColumnName%3EMeet%3C%2FColumnName%3E%3C%2FColumn%3E%3CColumn%3E%3CField%3EPARAMETER1%3C%2FField%3E%3CColumnName%3ECourse%3C%2FColumnName%3E%3C%2FColumn%3E%3C%2FDownloadColumns%3E"
headers = {
  'Connection': 'keep-alive',
  'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="90", "Google Chrome";v="90"',
  'accept': '*/*',
  'x-requested-with': 'XMLHttpRequest',
  'accept-language': 'tr-TR,tr;q=0.9,en-US;q=0.8,en;q=0.7,ru;q=0.6',
  'sec-ch-ua-mobile': '?0',
  'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36',
  'content-type': 'application/x-www-form-urlencoded; charset=UTF-8',
  'Origin': 'https://connect.swimming.org.nz',
  'Sec-Fetch-Site': 'same-origin',
  'Sec-Fetch-Mode': 'cors',
  'Sec-Fetch-Dest': 'empty',
  'Referer': 'https://connect.swimming.org.nz/snz-wrap-public/workflows/COMP.RESULTS.FIND',
  'Cookie': 'JSESSIONID=93F2FEA63BA41ECB2505E2D1CD76374D; _ga=GA1.3.1735786808.1620106921; _gid=GA1.3.1806138988.1620106921'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM