我需要从使用框架的网站抓取数据

Question

Please, I want you to give me some orientation about a project, cause I'm lost and really dunno where to start.拜托，我想让你给我一些关于一个项目的方向，因为我迷路了，真的不知道从哪里开始。

I'm pretty newbie at Python, but I've already did a web scraping script to get some information from some websites, using lxml and xpath to get data trough the HTML DOM.我是 Python 的新手，但我已经做了一个网络抓取脚本来从一些网站获取一些信息，使用 lxml 和 xpath 通过 HTML DOM 获取数据。

But now, the client presented to me a challenge...但是现在，客户向我提出了一个挑战......

This website is using frames where I have to get data =( And I don't know how to handle with that...该网站正在使用我必须获取数据的框架 =( 而且我不知道如何处理...

And to complicate even more, the site requires login :(更复杂的是，该网站需要登录:(

If someone could help me with some information, like where do I have to start?如果有人可以帮助我提供一些信息，例如我必须从哪里开始？

Is it possible to get data from a website that shows data into frames?是否可以从将数据显示为框架的网站获取数据？

Here's the web address: https://www.bulkshared.com/online-ordering这是网址： https : //www.bulkshared.com/online-ordering

I want to point the script to the "Pantry" section, but the url don't shows the path =(我想将脚本指向“Pantry”部分，但 url 不显示路径 =(

Do you recommend me which kind of script?你给我推荐哪种脚本？ I want to use Python, but do I have to use BS?我想使用 Python，但我必须使用 BS 吗？ Xpath? Xpath？ Selenium?硒？

Could someone donate a little portion of your time to try to help me?有人可以捐出你的一小部分时间来帮助我吗？

Thank you very much for your time, guys!非常感谢您的时间，伙计们！

Answer 1

import requests
from bs4 import BeautifulSoup
import re
import csv


def Login(url):
    with requests.Session() as req:
        r = req.get(url)
        soup = BeautifulSoup(r.content, 'html.parser')
        script = soup.find("script", type="text/javascript").text
        collectionId = re.search("collectionId\":\"(.*?)\"", script).group(1)
        metaSiteId = re.search("metaSiteId\":\"(.*?)\"", script).group(1)
        svSession = re.search("svSession\":\"(.*?)\"", script).group(1)
        data = {
            'email': 'test@test.com',
            'password': 'test123',
            'collectionId': collectionId,
            'metaSiteId': metaSiteId,
            'appUrl': 'https://www.bulkshared.com/online-ordering',
            'svSession': svSession
        }
        r = req.post(
            "https://www.bulkshared.com/_api/wix-sm-webapp/member/login", data=data)
        r = req.get(
            "https://api.wixrestaurants.com/v2/organizations/5716166580714419/full").json()
        return r


def Sorter():
    data = Login("https://www.bulkshared.com/")
    with open("result.csv", 'w', newline="", encoding="UTF-8") as f:
        writer = csv.writer(f)
        writer.writerow(["Name", "Price"])
        for item in data["menu"]["items"]:
            title = item["title"]["en_AU"]
            try:
                price = item["price"]
            except:
                price = "N/A"
            try:
                description = item["description"]["en_AU"].strip()
            except:
                description = "N/A"
            writer.writerow([title, description, price])


Sorter()

Note: after I've written the code, i discovered that the API is completely public and doesn't require passing any login session info.注意：写完代码后，我发现API是完全公开的，不需要传递任何登录会话信息。

So you can call it directly.所以你可以直接调用它。

import requests
import json

r = requests.get(
    "https://api.wixrestaurants.com/v2/organizations/5716166580714419/full").json()

print(json.dumps(r, indent=4))

我需要从使用框架的网站抓取数据

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-03-27 06:30:29

我需要从使用框架的网站抓取数据

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-03-27 06:30:29

解决方案1
1 已采纳 2020-03-27 06:30:29