簡體   English   中英

Python寫入抓取數據的json文件

[英]Python writing to json file of scraped data

我寫了一個網頁抓取腳本,效果很好。 我正在嘗試將抓取的數據寫入json文件,但失敗了。

這是我的片段:

def scrape_post_info(url):
    content = get_page_content(url)
    title, description, post_url = get_post_details(content, url)
    job_dict = {}
    job_dict['title'] = title
    job_dict['Description'] = description
    job_dict['url'] = post_url

    json_job = json.dumps(job_dict)
    with open('data.json', 'a') as f:
        json.dump(json_job, f)

if __name__ == '__main__':
    urls = ['url1', 'url2', 'url3', 'url4']
    for url in urls:
        scrape_post_info(url)

忽略我在函數內部調用的兩個函數,問題不在於它們

我的問題只是寫入json。

目前我正在獲取如下所示的抓取數據並且格式錯誤

data.json如下:

{
    "title": "this is title",
    "Description": " Fendi is an Italian luxury labelarin. ",
    "url": "https:/~"
}

{
    "title": " - Furrocious Elegant Style", 
    "Description": " the Italian luxare vast. ", 
    "url": "https://www.s"
}
    
{
    "title": "Rome, Fountains and Fendi Sunglasses",
    "Description": " Fendi started off as a store. ",
    "url": "https://www.~"
}
    
{
    "title": "Tipsnglasses",
    "Description": "Whether irregular orn season.", 
    "url": "https://www.sooic"
}

但它應該是這樣的:

[
{
    "title": "this is title",
    "Description": " Fendi is an Italian luxury labelarin. ",
    "url": "https:/~"
},

{
    "title": " - Furrocious Elegant Style", 
    "Description": " the Italian luxare vast. ", 
    "url": "https://www.s"
},
    
{
    "title": "Rome, Fountains and Fendi Sunglasses",
    "Description": " Fendi started off as a store. ",
    "url": "https://www.~"
},
    
{
    "title": "Tipsnglasses",
    "Description": "Whether irregular orn season.", 
    "url": "https://www.sooic"
},

]

我不明白為什么我沒有以正確的格式在 json 文件中獲取數據..

任何人都可以幫助我嗎?

您可以嘗試使用此代碼來解決您的問題。 您將獲得上述預期的確切文件,以下是代碼:

import json
def scrape_post_info(url, f):
    content = get_page_content(url)
    title, description, post_url = get_post_details(content, url)
    job_dict = {}
    job_dict['title'] = title
    job_dict['Description'] = description
    job_dict['url'] = post_url 

    json_job = json.dumps(job_dict)
    f.seek(0)
    txt = f.readline()
    if txt.endswith("}"):
        f.write(",")
    f.write(json_job)

if __name__ == '__main__':
    urls = ['url1', 'url2', 'url3', 'url4']
    with open('data.json', 'r+') as f:
        f.write("[")
        for url in urls:
            scrape_post_info(url,f)
        f.write("]")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM