[英]Need help exporting data to JSON file
i just go into coding and also coding in Python.我只是进入编码和 Python 编码。 Currently i'm working on a webcrawler.目前我正在开发一个网络爬虫。 I need to save my data to a JSON file so i can export it into MongoDB.我需要将我的数据保存到一个 JSON 文件中,以便我可以将它导出到 MongoDB 中。
import requests
import json
from bs4 import BeautifulSoup
url= ["http://www.alternate.nl/html/product/listing.html?filter_5=&filter_4=&filter_3=&filter_2=&filter_1=&size=500&lk=9435&tk=7&navId=11626#listingResult"]
amd = requests.get(url[0])
soupamd = BeautifulSoup(amd.content)
prodname = []
adinfo = []
formfactor = []
socket = []
grafisch = []
prijs = []
a_data = soupamd.find_all("div", {"class": "listRow"})
for item in a_data:
try:
prodname.insert(len(prodname),item.find_all("span", {"class": "name"})[0].text)
adinfo.insert(len(adinfo), item.find_all("span", {"class": "additional"})[0].text)
formfactor.insert(len(formfactor), item.find_all("span", {"class": "info"})[0].text)
grafisch.insert(len(grafisch), item.find_all("span", {"class": "info"})[1].text)
socket.insert(len(socket), item.find_all("span", {"class": "info"})[2].text)
prijs.insert(len(prijs), item.find_all("span", {"class": "price right right10"})[0].text)
except:
pass
I'm stuck at this part.我被困在这部分。 I want to export the data that I saved in the arrays to a JSON file.我想将保存在数组中的数据导出到 JSON 文件。 This is what I have now:这就是我现在所拥有的:
file = open("mobos.json", "w")
for i = 0:
try:
output = {"productnaam": [prodname[i]],
"info" : [adinfo[i]],
"formfactor" : [formfactor[i]],
"grafisch" : [grafisch[i]],
"socket" : [socket[i]],
"prijs" : [prijs[i]]}
i + 1
json.dump(output, file)
if i == 500:
break
except:
pass
file.close()
So I want to create a dictionary format like this:所以我想创建一个这样的字典格式:
{"productname" : [prodname[0]], "info" : [adinfo[0]], "formfactor" : [formfactor[0]] .......}
{"productname" : [prodname[1]], "info" : [adinfo[1]], "formfactor" : [formfactor[1]] .......}
{"productname" : [prodname[2]], "info" : [adinfo[2]], "formfactor" : [formfactor[2]] .......} etc.
Create dictionaries to begin with, in one list, then save that one list to a JSON file so you have one valid JSON object:首先在一个列表中创建字典,然后将该列表保存到 JSON 文件中,这样您就有了一个有效的 JSON 对象:
soupamd = BeautifulSoup(amd.content)
products = []
for item in soupamd.select("div.listRow"):
prodname = item.find("span", class_="name")
adinfo = item.find("span", class_="additional")
formfactor, grafisch, socket = item.find_all("span", class_="info")[:3]
prijs = item.find("span", class_="price")
products.append({
'prodname': prodname.text.strip(),
'adinfo': adinfo.text.strip(),
'formfactor': formfactor.text.strip(),
'grafisch': grafisch.text.strip(),
'socket': socket.text.strip(),
'prijs': prijs.text.strip(),
})
with open("mobos.json", "w") as outfile:
json.dump(products, outfile)
If you really want to produce separate JSON objects, one on each line, write newlines in between so you can at least find these objects back again (parsing is going to be a beast otherwise):如果你真的想生成单独的 JSON 对象,每行一个,在它们之间写下换行符,这样你至少可以再次找到这些对象(否则解析将是一场野兽):
with open("mobos.json", "w") as outfile:
for product in products:
json.dump(products, outfile)
outfile.write('\n')
Because we now have one list of objects, looping over that list with for
is far simpler.因为我们现在有一个对象列表,所以使用for
循环遍历该列表要简单得多。
Some other differences from your code:与您的代码的一些其他差异:
list.append()
rather than list.insert()
;使用list.append()
而不是list.insert()
; there is no need for such verbose code when there is a standard method for the task.当任务有标准方法时,就不需要这样冗长的代码。element.find()
rather than element.find_all()
如果您只查找一个匹配项,请使用element.find()
而不是element.find_all()
str.strip()
to remove the extra whitespace that usually is added in HTML documents;我使用str.strip()
删除通常添加到 HTML 文档中的额外空格; you could also add an extra ' '.join(textvalue.split())
to remove internal newlines and collapse whitespace, but this specific webpage doesn't seem to require that measure.您还可以添加额外的' '.join(textvalue.split())
以删除内部换行符并折叠空白,但此特定网页似乎不需要该措施。Since the OP wanted a JSON with dictionary-like objects and did not specify that they should be in a list within the JSON, this code might work better:由于 OP 想要一个带有类似字典的对象的 JSON 并且没有指定它们应该在 JSON 中的列表中,因此此代码可能会更好地工作:
outFile = open("mobos.json", mode='wt')
for item in soupamd.select("div.listRow"):
prodname = item.find("span", class_="name")
adinfo = item.find("span", class_="additional")
formfactor, grafisch, socket = item.find_all("span", class_="info")[:3]
prijs = item.find("span", class_="price")
tempDict = {
'prodname': prodname.text.strip(),
'adinfo': adinfo.text.strip(),
'formfactor': formfactor.text.strip(),
'grafisch': grafisch.text.strip(),
'socket': socket.text.strip(),
'prijs': prijs.text.strip(),
}
json.dump(tempDict, outFile)
outFile.close()
There is no need to write a new line because json.dump
takes care of that automatically.无需编写新行,因为json.dump
会自动处理。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.