简体   繁体   English

需要帮助将数据导出到 JSON 文件

[英]Need help exporting data to JSON file

i just go into coding and also coding in Python.我只是进入编码和 Python 编码。 Currently i'm working on a webcrawler.目前我正在开发一个网络爬虫。 I need to save my data to a JSON file so i can export it into MongoDB.我需要将我的数据保存到一个 JSON 文件中,以便我可以将它导出到 MongoDB 中。

import requests
import json
from bs4 import BeautifulSoup 

url= ["http://www.alternate.nl/html/product/listing.html?filter_5=&filter_4=&filter_3=&filter_2=&filter_1=&size=500&lk=9435&tk=7&navId=11626#listingResult"] 

amd = requests.get(url[0])
soupamd = BeautifulSoup(amd.content) 

prodname = [] 
adinfo = [] 
formfactor = []
socket = [] 
grafisch = []
prijs = []

a_data = soupamd.find_all("div", {"class": "listRow"}) 
for item in a_data: 
    try:
        prodname.insert(len(prodname),item.find_all("span", {"class": "name"})[0].text)
        adinfo.insert(len(adinfo), item.find_all("span", {"class": "additional"})[0].text)
        formfactor.insert(len(formfactor), item.find_all("span", {"class": "info"})[0].text)
        grafisch.insert(len(grafisch), item.find_all("span", {"class": "info"})[1].text)
        socket.insert(len(socket), item.find_all("span", {"class": "info"})[2].text)
        prijs.insert(len(prijs), item.find_all("span", {"class": "price right right10"})[0].text)
    except: 
        pass

I'm stuck at this part.我被困在这部分。 I want to export the data that I saved in the arrays to a JSON file.我想将保存在数组中的数据导出到 JSON 文件。 This is what I have now:这就是我现在所拥有的:

file = open("mobos.json", "w")

for  i = 0:  
    try: 
        output = {"productnaam": [prodname[i]],
        "info" : [adinfo[i]], 
        "formfactor" : [formfactor[i]],
        "grafisch" : [grafisch[i]],
        "socket" : [socket[i]], 
        "prijs" : [prijs[i]]} 
        i + 1
        json.dump(output, file)
        if i == 500: 
            break
    except: 
        pass 

file.close()

So I want to create a dictionary format like this:所以我想创建一个这样的字典格式:

{"productname" : [prodname[0]], "info" : [adinfo[0]], "formfactor" : [formfactor[0]] .......}
{"productname" : [prodname[1]], "info" : [adinfo[1]], "formfactor" : [formfactor[1]] .......}
{"productname" : [prodname[2]], "info" : [adinfo[2]], "formfactor" : [formfactor[2]] .......} etc.

Create dictionaries to begin with, in one list, then save that one list to a JSON file so you have one valid JSON object:首先在一个列表中创建字典,然后将该列表保存到 JSON 文件中,这样您就有了一个有效的 JSON 对象:

soupamd = BeautifulSoup(amd.content) 
products = []

for item in soupamd.select("div.listRow"):
    prodname = item.find("span", class_="name")
    adinfo = item.find("span", class_="additional")
    formfactor, grafisch, socket = item.find_all("span", class_="info")[:3]
    prijs = item.find("span", class_="price")
    products.append({
        'prodname': prodname.text.strip(),
        'adinfo': adinfo.text.strip(),
        'formfactor': formfactor.text.strip(),
        'grafisch': grafisch.text.strip(),
        'socket': socket.text.strip(),
        'prijs': prijs.text.strip(),
    })

with open("mobos.json", "w") as outfile:
    json.dump(products, outfile)

If you really want to produce separate JSON objects, one on each line, write newlines in between so you can at least find these objects back again (parsing is going to be a beast otherwise):如果你真的想生成单独的 JSON 对象,每行一个,在它们之间写下换行符,这样你至少可以再次找到这些对象(否则解析将是一场野兽):

with open("mobos.json", "w") as outfile:
    for product in products:
        json.dump(products, outfile)
        outfile.write('\n')

Because we now have one list of objects, looping over that list with for is far simpler.因为我们现在有一个对象列表,所以使用for循环遍历该列表要简单得多。

Some other differences from your code:与您的代码的一些其他差异:

  • Use list.append() rather than list.insert() ;使用list.append()而不是list.insert() there is no need for such verbose code when there is a standard method for the task.当任务有标准方法时,就不需要这样冗长的代码。
  • If you are looking for just one match, use element.find() rather than element.find_all()如果您只查找一个匹配项,请使用element.find()而不是element.find_all()
  • You really want to avoid using blanket exception handling ;你真的想避免使用一揽子异常处理 you'll mask far more than you want to.你会比你想要的更多。 Catch specific exceptions only.仅捕获特定异常。
  • I used str.strip() to remove the extra whitespace that usually is added in HTML documents;我使用str.strip()删除通常添加到 HTML 文档中的额外空格; you could also add an extra ' '.join(textvalue.split()) to remove internal newlines and collapse whitespace, but this specific webpage doesn't seem to require that measure.您还可以添加额外的' '.join(textvalue.split())以删除内部换行符并折叠空白,但此特定网页似乎不需要该措施。

Since the OP wanted a JSON with dictionary-like objects and did not specify that they should be in a list within the JSON, this code might work better:由于 OP 想要一个带有类似字典的对象的 JSON 并且没有指定它们应该在 JSON 中的列表中,因此此代码可能会更好地工作:

outFile = open("mobos.json", mode='wt')
for item in soupamd.select("div.listRow"):
    prodname = item.find("span", class_="name")
    adinfo = item.find("span", class_="additional")
    formfactor, grafisch, socket = item.find_all("span", class_="info")[:3]
    prijs = item.find("span", class_="price")
    tempDict = {
        'prodname': prodname.text.strip(),
        'adinfo': adinfo.text.strip(),
        'formfactor': formfactor.text.strip(),
        'grafisch': grafisch.text.strip(),
        'socket': socket.text.strip(),
        'prijs': prijs.text.strip(),
    }
    json.dump(tempDict, outFile)
outFile.close()

There is no need to write a new line because json.dump takes care of that automatically.无需编写新行,因为json.dump会自动处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM