[英]I need to web scraping data from a website that uses frames
Please, I want you to give me some orientation about a project, cause I'm lost and really dunno where to start.拜托,我想让你给我一些关于一个项目的方向,因为我迷路了,真的不知道从哪里开始。
I'm pretty newbie at Python, but I've already did a web scraping script to get some information from some websites, using lxml and xpath to get data trough the HTML DOM.我是 Python 的新手,但我已经做了一个网络抓取脚本来从一些网站获取一些信息,使用 lxml 和 xpath 通过 HTML DOM 获取数据。
But now, the client presented to me a challenge...但是现在,客户向我提出了一个挑战......
This website is using frames where I have to get data =( And I don't know how to handle with that...该网站正在使用我必须获取数据的框架 =( 而且我不知道如何处理...
And to complicate even more, the site requires login :(更复杂的是,该网站需要登录:(
If someone could help me with some information, like where do I have to start?如果有人可以帮助我提供一些信息,例如我必须从哪里开始?
Is it possible to get data from a website that shows data into frames?是否可以从将数据显示为框架的网站获取数据?
Here's the web address: https://www.bulkshared.com/online-ordering这是网址: https : //www.bulkshared.com/online-ordering
I want to point the script to the "Pantry" section, but the url don't shows the path =(我想将脚本指向“Pantry”部分,但 url 不显示路径 =(
Do you recommend me which kind of script?你给我推荐哪种脚本? I want to use Python, but do I have to use BS?
我想使用 Python,但我必须使用 BS 吗? Xpath?
Xpath? Selenium?
硒?
Could someone donate a little portion of your time to try to help me?有人可以捐出你的一小部分时间来帮助我吗?
Thank you very much for your time, guys!非常感谢您的时间,伙计们!
import requests
from bs4 import BeautifulSoup
import re
import csv
def Login(url):
with requests.Session() as req:
r = req.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
script = soup.find("script", type="text/javascript").text
collectionId = re.search("collectionId\":\"(.*?)\"", script).group(1)
metaSiteId = re.search("metaSiteId\":\"(.*?)\"", script).group(1)
svSession = re.search("svSession\":\"(.*?)\"", script).group(1)
data = {
'email': 'test@test.com',
'password': 'test123',
'collectionId': collectionId,
'metaSiteId': metaSiteId,
'appUrl': 'https://www.bulkshared.com/online-ordering',
'svSession': svSession
}
r = req.post(
"https://www.bulkshared.com/_api/wix-sm-webapp/member/login", data=data)
r = req.get(
"https://api.wixrestaurants.com/v2/organizations/5716166580714419/full").json()
return r
def Sorter():
data = Login("https://www.bulkshared.com/")
with open("result.csv", 'w', newline="", encoding="UTF-8") as f:
writer = csv.writer(f)
writer.writerow(["Name", "Price"])
for item in data["menu"]["items"]:
title = item["title"]["en_AU"]
try:
price = item["price"]
except:
price = "N/A"
try:
description = item["description"]["en_AU"].strip()
except:
description = "N/A"
writer.writerow([title, description, price])
Sorter()
Note: after I've written the code, i discovered that the API
is completely public and doesn't require passing any login session info.注意:写完代码后,我发现
API
是完全公开的,不需要传递任何登录会话信息。
So you can call it directly.所以你可以直接调用它。
import requests
import json
r = requests.get(
"https://api.wixrestaurants.com/v2/organizations/5716166580714419/full").json()
print(json.dumps(r, indent=4))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.