简体   繁体   English

Web 使用 Python 抓取 Javascript 表(带有网格和列表视图) - Beautiful Soup

[英]Web Scraping of Javascript table (with grid and list views) using Python - Beautiful Soup

I'm trying to parse data from a json table from this website.我正在尝试从该网站的 json 表中解析数据。

url - https://boxes.mysubscriptionaddiction.com/subscription_boxes_for/food . url - https://boxes.mysubscriptionaddiction.com/subscription_boxes_for/food

I primarily need the name, ratings and descriptions of all the food subscription boxes listed.我主要需要列出的所有食品订阅框的名称、评级和描述。 I'm facing a few challenges here.我在这里面临一些挑战。 One is that there are 2 views to the table - grid and list view.一是表格有 2 个视图 - 网格和列表视图。 How do we specify which table view we are referring to in our code?我们如何指定我们在代码中引用的表视图? Second is that I am getting a其次是我得到了一个

ValueError - Timeout value connect was Timeout(connect=<object object at 0x000002767CECD5C0>, 
read=<object object at 0x000002767CECD5C0>, total=None), but it must be an int, float or None.

Not sure what this means.不知道这意味着什么。
My code:我的代码:

from pandas.io.html import read_html
from selenium import webdriver
import json
import requests
import os
import sys
from bs4 import BeautifulSoup
import requests


driver = webdriver.Firefox(executable_path='C:\Drivers\geckodriver.exe')

driver.get('https://boxes.mysubscriptionaddiction.com/subscription_boxes_for/food')


table = driver.find_element_by_xpath('/html/body/div[3]/div/span/div[2]/div/div[1]/div[3]/div[3]/table')

table_html = table.get_attribute('innerHTML')

bs = BeautifulSoup(table_html, 'html.parser')

rows = bs.select('tbody tr')

print(bs)

Here is how to get the data you are looking for: ( data is a dict that contains the information)以下是获取您要查找的数据的方法:( data是包含信息的字典)

import requests
from bs4 import BeautifulSoup
import json

scrape_url = 'https://boxes.mysubscriptionaddiction.com/subscription_boxes_for/food'

r1 = requests.get(scrape_url)
page = r1.content
soup = BeautifulSoup(page, 'html.parser')
scripts = soup.find_all('script')

data_str = scripts[11].contents[0].strip()
data = json.loads(data_str,strict=False)
print(data['itemListElement'])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM