[英]Web-scraping problem, how to display data from 2 different sites in one html file
我在同一個 html 文件(在一張表中)顯示來自 2 個不同站點的數據時遇到問題。
我一直在嘗試很多事情,尋找任何解決方案,但沒有任何幫助。 您還可以將任何 python/bs/web-scraping 教程鏈接到我未來的“問題”:D
提前致謝。
代碼:
import pandas as pd
import requests
from bs4 import BeautifulSoup
odpowiedz = requests.get(
"https://www.nike.com/pl/w?q=react%20270&vst=react%20270")
soup = BeautifulSoup(odpowiedz.text, 'html.parser')
items = soup.find_all(
class_='product-card css-1pclthi ncss-col-sm-6 ncss-col-lg-4 va-sm-t product-grid__card')
title = [item.find(class_='product-card__title').get_text()
for item in items]
price = [item.find(class_='product-card__price').get_text()
for item in items]
linki = [item.find(class_='product-card__link-overlay').attrs['href']
for item in items]
odpowiedz = requests.get(
"https://www.nike.com/pl/w/air-max-97-buty-77f38zy7ok")
soup = BeautifulSoup(odpowiedz.text, 'html.parser')
items = soup.find_all(
class_='product-card css-1pclthi ncss-col-sm-6 ncss-col-lg-4 va-sm-t product-grid__card')
title = [item.find(class_='product-card__title').get_text()
for item in items]
price = [item.find(class_='product-card__price').get_text()
for item in items]
linki = [item.find(class_='product-card__link-overlay').attrs['href']
for item in items]
wynik = pd.DataFrame(
{
'Model': title,
'Cena': price,
'Link': linki,
})
print(wynik)
wynik.to_html('official.html')
該程序的結果是來自第一個網站 (nike react) 的 id、產品名稱、價格和鏈接(在本例中為鞋子),我想添加來自第二個站點(nike air max 97)的數據並將其添加到表中第一個結果(耐克反應)
肯定有更好的方法來做到這一點。 但這是一個快速的創可貼解決方案:-
import pandas as pd
import requests
from bs4 import BeautifulSoup
title = []
price = []
linki = []
odpowiedz = requests.get(
"https://www.nike.com/pl/w?q=react%20270&vst=react%20270")
soup = BeautifulSoup(odpowiedz.text, 'html.parser')
items = soup.find_all(
class_='product-card css-1pclthi ncss-col-sm-6 ncss-col-lg-4 va-sm-t product-grid__card')
title.append([item.find(class_='product-card__title').get_text()
for item in items])
price.append([item.find(class_='product-card__price').get_text()
for item in items])
linki.append([item.find(class_='product-card__link-overlay').attrs['href']
for item in items])
odpowiedz = requests.get(
"https://www.nike.com/pl/w/air-max-97-buty-77f38zy7ok")
soup = BeautifulSoup(odpowiedz.text, 'html.parser')
items = soup.find_all(
class_='product-card css-1pclthi ncss-col-sm-6 ncss-col-lg-4 va-sm-t product-grid__card')
title.append([item.find(class_='product-card__title').get_text()
for item in items])
price.append([item.find(class_='product-card__price').get_text()
for item in items])
linki.append([item.find(class_='product-card__link-overlay').attrs['href']
for item in items])
flat_titles = [titles for sublisttitle in title for titles in sublisttitle]
flat_prices = [prices for sublistprice in price for prices in sublistprice]
flat_links = [links for sublistlinks in linki for links in sublistlinks]
wynik = pd.DataFrame(
{
'Model': flat_titles,
'Cena': flat_prices,
'Link': flat_links,
})
print(wynik)
wynik.to_html('official.html')
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.