[英]Python scraping and importing as excel file : TypeError: must be real number, not dict
i'm trying in this code, to scrape through a website: https://wish2.ma/product-category/maison-cuisine and save data to my excel file, the logic is working well as you see, i am able to loop through the pages and extract the data i want, but i get stuck at line 70 First here is my code:我正在尝试使用此代码来浏览网站: https://wish2.ma/product-category/maison-cuisine并将数据保存到我的 excel 文件中,如您所见,逻辑运行良好,我能够循环浏览页面并提取我想要的数据,但我卡在第 70 行 首先是我的代码:
from requests_html import HTMLSession
import csv
import html
from bs4 import BeautifulSoup
import requests
import win32com.client as win32
s = HTMLSession()
links = []
for x in range(3,4):
print(x)
url = f'https://wish2.ma/product-category/maison-cuisine/page/{x}'
r = s.get(url)
items = r.html.find('li.product-type-simple')
for item in items:
links.append(item.find('a', first=True).attrs['href'])
def get_productdata(links):
r = s.get(link)
#title = r.html.find('h1', first=True)
price = r.html.find('span.woocommerce-Price-amount.amount bdi')[0].full_text
price2 = r.html.find('span.woocommerce-Price-amount.amount bdi')[1].full_text
tag = r.html.find('a[rel=tag]', first=True).full_text
category =r.html.find('span.ast-woo-product-category')[0].full_text
r = requests.get(link)
soup=BeautifulSoup(r.content.decode('utf-8'), 'html.parser')
title = soup.find('h1',{'class','product_title'})
description = soup.find('div',{'class','woocommerce-tabs'}).decode_contents()
print(title)
#description = r.html.find('div.woocommerce-tabs')
product = {
'title': title.text,
'price': price.strip(),
'price2': price2.strip(),
'tag': tag.strip(),
'category': category.strip(),
'description': description
}
return product
results = []
#links = get_links()
print(len(links))
ExcelApp = win32.dynamic.Dispatch('Excel.Application')
ExcelApp.Visible = True
for link in links:
print(link)
results.append(get_productdata(link))
break
wb = ExcelApp.Workbooks.Add()
ws = wb.Worksheets(1)
header_labels = ['title','price','price2','tag','category','description']
for indx, val in enumerate(header_labels):
ws.Cells(1, indx + 1).Value = val
row_tracker = 2
column_size = len(header_labels
for result in results:
ws.Range(
ws.Cells(row_tracker, 1),
ws.Cells(row_tracker, column_size)
).value=result
row_tracker += 1
wb.SaveAs(os.path.join(os.getcwd(),'hhhh.xlsx'), 51)
wb.close()
ExcelApp.Quit()
this is the error message i get when running the script:这是我在运行脚本时收到的错误消息:
Traceback (most recent call last):
File "C:\Users\kamal\Desktop\amina\scrape_wish.py", line 69, in <module>
).value=result
File "C:\Users\kamal\AppData\Local\Programs\Python\Python310\lib\site-packages\win32com\client\dynamic.py", line 698, in __setattr__
self._oleobj_.Invoke(entry.dispid, 0, invoke_type, 0, value)
TypeError: must be real number, not dict
i can't understand it, nor how to solve it, please help me.我无法理解它,也无法解决它,请帮助我。
Thank's to Michael i found the issue in this line of code,感谢迈克尔,我在这行代码中发现了问题,
product = {
'title': title.text,
'price': price.strip(),
'price2': price2.strip(),
'tag': tag.strip(),
'category': category.strip(),
'description': description
}
i was trying to paste an object here, instead of an array我试图在这里粘贴一个 object,而不是一个数组
ws.Range(
ws.Cells(row_tracker, 1),
ws.Cells(row_tracker, column_size)
).value=result
so i made an array like this one所以我做了一个像这样的数组
product = [title.text,price.strip(),price2.strip(),tag.strip(),category.strip(),description]
and i set a sleeping time to let the excel app save before closing我设置了一个睡眠时间,让 excel 应用程序在关闭前保存
time.sleep(2)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.