[英]Python writing to json file of scraped data
我寫了一個網頁抓取腳本,效果很好。 我正在嘗試將抓取的數據寫入json
文件,但失敗了。
這是我的片段:
def scrape_post_info(url):
content = get_page_content(url)
title, description, post_url = get_post_details(content, url)
job_dict = {}
job_dict['title'] = title
job_dict['Description'] = description
job_dict['url'] = post_url
json_job = json.dumps(job_dict)
with open('data.json', 'a') as f:
json.dump(json_job, f)
if __name__ == '__main__':
urls = ['url1', 'url2', 'url3', 'url4']
for url in urls:
scrape_post_info(url)
忽略我在函數內部調用的兩個函數,問題不在於它們
我的問題只是寫入json。
目前我正在獲取如下所示的抓取數據並且格式錯誤
data.json
如下:
{
"title": "this is title",
"Description": " Fendi is an Italian luxury labelarin. ",
"url": "https:/~"
}
{
"title": " - Furrocious Elegant Style",
"Description": " the Italian luxare vast. ",
"url": "https://www.s"
}
{
"title": "Rome, Fountains and Fendi Sunglasses",
"Description": " Fendi started off as a store. ",
"url": "https://www.~"
}
{
"title": "Tipsnglasses",
"Description": "Whether irregular orn season.",
"url": "https://www.sooic"
}
但它應該是這樣的:
[
{
"title": "this is title",
"Description": " Fendi is an Italian luxury labelarin. ",
"url": "https:/~"
},
{
"title": " - Furrocious Elegant Style",
"Description": " the Italian luxare vast. ",
"url": "https://www.s"
},
{
"title": "Rome, Fountains and Fendi Sunglasses",
"Description": " Fendi started off as a store. ",
"url": "https://www.~"
},
{
"title": "Tipsnglasses",
"Description": "Whether irregular orn season.",
"url": "https://www.sooic"
},
]
我不明白為什么我沒有以正確的格式在 json 文件中獲取數據..
任何人都可以幫助我嗎?
您可以嘗試使用此代碼來解決您的問題。 您將獲得上述預期的確切文件,以下是代碼:
import json
def scrape_post_info(url, f):
content = get_page_content(url)
title, description, post_url = get_post_details(content, url)
job_dict = {}
job_dict['title'] = title
job_dict['Description'] = description
job_dict['url'] = post_url
json_job = json.dumps(job_dict)
f.seek(0)
txt = f.readline()
if txt.endswith("}"):
f.write(",")
f.write(json_job)
if __name__ == '__main__':
urls = ['url1', 'url2', 'url3', 'url4']
with open('data.json', 'r+') as f:
f.write("[")
for url in urls:
scrape_post_info(url,f)
f.write("]")
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.