简体   繁体   English

Python写入抓取数据的json文件

[英]Python writing to json file of scraped data

I wrote a web scraping script and it is working great.我写了一个网页抓取脚本,效果很好。 I am trying to write the scraped data to json file but i failed.我正在尝试将抓取的数据写入json文件,但失败了。

this is my snippet:这是我的片段:

def scrape_post_info(url):
    content = get_page_content(url)
    title, description, post_url = get_post_details(content, url)
    job_dict = {}
    job_dict['title'] = title
    job_dict['Description'] = description
    job_dict['url'] = post_url

    json_job = json.dumps(job_dict)
    with open('data.json', 'a') as f:
        json.dump(json_job, f)

if __name__ == '__main__':
    urls = ['url1', 'url2', 'url3', 'url4']
    for url in urls:
        scrape_post_info(url)

ignore two function i called inside the function, problem not with them忽略我在函数内部调用的两个函数,问题不在于它们

My problem only is writing to json.我的问题只是写入json。

Currently i am getting the scraped data like this below and there are wrong format目前我正在获取如下所示的抓取数据并且格式错误

data.json are below: data.json如下:

{
    "title": "this is title",
    "Description": " Fendi is an Italian luxury labelarin. ",
    "url": "https:/~"
}

{
    "title": " - Furrocious Elegant Style", 
    "Description": " the Italian luxare vast. ", 
    "url": "https://www.s"
}
    
{
    "title": "Rome, Fountains and Fendi Sunglasses",
    "Description": " Fendi started off as a store. ",
    "url": "https://www.~"
}
    
{
    "title": "Tipsnglasses",
    "Description": "Whether irregular orn season.", 
    "url": "https://www.sooic"
}

but it should be like these:但它应该是这样的:

[
{
    "title": "this is title",
    "Description": " Fendi is an Italian luxury labelarin. ",
    "url": "https:/~"
},

{
    "title": " - Furrocious Elegant Style", 
    "Description": " the Italian luxare vast. ", 
    "url": "https://www.s"
},
    
{
    "title": "Rome, Fountains and Fendi Sunglasses",
    "Description": " Fendi started off as a store. ",
    "url": "https://www.~"
},
    
{
    "title": "Tipsnglasses",
    "Description": "Whether irregular orn season.", 
    "url": "https://www.sooic"
},

]

I am not getting exactly why i am not getting data in json file in proper formate..我不明白为什么我没有以正确的格式在 json 文件中获取数据..

Can anyone help me in this?任何人都可以帮助我吗?

You can try this code to solve your problem.您可以尝试使用此代码来解决您的问题。 you will get exact file as you expected above, following is the code:您将获得上述预期的确切文件,以下是代码:

import json
def scrape_post_info(url, f):
    content = get_page_content(url)
    title, description, post_url = get_post_details(content, url)
    job_dict = {}
    job_dict['title'] = title
    job_dict['Description'] = description
    job_dict['url'] = post_url 

    json_job = json.dumps(job_dict)
    f.seek(0)
    txt = f.readline()
    if txt.endswith("}"):
        f.write(",")
    f.write(json_job)

if __name__ == '__main__':
    urls = ['url1', 'url2', 'url3', 'url4']
    with open('data.json', 'r+') as f:
        f.write("[")
        for url in urls:
            scrape_post_info(url,f)
        f.write("]")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM