简体   繁体   English

我需要从使用框架的网站抓取数据

[英]I need to web scraping data from a website that uses frames

Please, I want you to give me some orientation about a project, cause I'm lost and really dunno where to start.拜托,我想让你给我一些关于一个项目的方向,因为我迷路了,真的不知道从哪里开始。

I'm pretty newbie at Python, but I've already did a web scraping script to get some information from some websites, using lxml and xpath to get data trough the HTML DOM.我是 Python 的新手,但我已经做了一个网络抓取脚本来从一些网站获取一些信息,使用 lxml 和 xpath 通过 HTML DOM 获取数据。

But now, the client presented to me a challenge...但是现在,客户向我提出了一个挑战......

This website is using frames where I have to get data =( And I don't know how to handle with that...该网站正在使用我必须获取数据的框架 =( 而且我不知道如何处理...

And to complicate even more, the site requires login :(更复杂的是,该网站需要登录:(

If someone could help me with some information, like where do I have to start?如果有人可以帮助我提供一些信息,例如我必须从哪里开始?

Is it possible to get data from a website that shows data into frames?是否可以从将数据显示为框架的网站获取数据?

Here's the web address: https://www.bulkshared.com/online-ordering这是网址: https : //www.bulkshared.com/online-ordering

I want to point the script to the "Pantry" section, but the url don't shows the path =(我想将脚本指向“Pantry”部分,但 url 不显示路径 =(

Do you recommend me which kind of script?你给我推荐哪种脚本? I want to use Python, but do I have to use BS?我想使用 Python,但我必须使用 BS 吗? Xpath? Xpath? Selenium?硒?

Could someone donate a little portion of your time to try to help me?有人可以捐出你的一小部分时间来帮助我吗?

Thank you very much for your time, guys!非常感谢您的时间,伙计们!

在此处输入图片说明

在此处输入图片说明

import requests
from bs4 import BeautifulSoup
import re
import csv


def Login(url):
    with requests.Session() as req:
        r = req.get(url)
        soup = BeautifulSoup(r.content, 'html.parser')
        script = soup.find("script", type="text/javascript").text
        collectionId = re.search("collectionId\":\"(.*?)\"", script).group(1)
        metaSiteId = re.search("metaSiteId\":\"(.*?)\"", script).group(1)
        svSession = re.search("svSession\":\"(.*?)\"", script).group(1)
        data = {
            'email': 'test@test.com',
            'password': 'test123',
            'collectionId': collectionId,
            'metaSiteId': metaSiteId,
            'appUrl': 'https://www.bulkshared.com/online-ordering',
            'svSession': svSession
        }
        r = req.post(
            "https://www.bulkshared.com/_api/wix-sm-webapp/member/login", data=data)
        r = req.get(
            "https://api.wixrestaurants.com/v2/organizations/5716166580714419/full").json()
        return r


def Sorter():
    data = Login("https://www.bulkshared.com/")
    with open("result.csv", 'w', newline="", encoding="UTF-8") as f:
        writer = csv.writer(f)
        writer.writerow(["Name", "Price"])
        for item in data["menu"]["items"]:
            title = item["title"]["en_AU"]
            try:
                price = item["price"]
            except:
                price = "N/A"
            try:
                description = item["description"]["en_AU"].strip()
            except:
                description = "N/A"
            writer.writerow([title, description, price])


Sorter()

Note: after I've written the code, i discovered that the API is completely public and doesn't require passing any login session info.注意:写完代码后,我发现API是完全公开的,不需要传递任何登录会话信息。

So you can call it directly.所以你可以直接调用它。

import requests
import json

r = requests.get(
    "https://api.wixrestaurants.com/v2/organizations/5716166580714419/full").json()

print(json.dumps(r, indent=4))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM