簡體   English   中英

json.decoder.JSONDecodeError:未終止的字符串

[英]json.decoder.JSONDecodeError: Unterminated string

我正在嘗試解析從 instagram 中提取的一些數據,以便可以使用 gephi 來可視化社交圖。

對於每個帳戶(./data/{person}/following.har)和(./data/{person}/profile.json),我都有一個.har 文件和.json。

當我嘗試將數據提取到一個 json 文件 (./out/people.json) 中時,我不斷收到錯誤消息“json.decoder.JSONDecodeError: Unterminated string at: line 1067001 column 26 (char 87399905)”。

錯誤似乎發生在data = json.loads(f.read())

我已經將ftfy視為一種潛在的解決方案,但沒有成功。

這是不帶 ftfy 的代碼。

import json
import os
import re
import base64

def extract_profile(data) -> list:
return data['graphql']['user']

def extract_followings(data) -> list:
friendship_api_url = r'https://i.instagram.com/api/v1/friendships/.*/following'

contents = [entry['response']['content'] for entry in data['log']['entries']
            if re.match(friendship_api_url, entry['request']['url'])]

# extract json data
users = []
for content in contents:
    if 'text' not in content:
        continue
    text = content['text']

    if 'encoding' in content:
        encoding = content['encoding']
        if encoding == 'base64':
            decodedBytes = base64.b64decode(text)
            decodedStr = str(decodedBytes, 'utf-8')
            user_json = json.loads(decodedStr)
            users.extend(user_json['users'])
    else:
        user_json= json.loads(text)
        users.extend(user_json['users'])
    
return users

def extract_person(person: str):
with open(f'./data/{person}/following.har', 'r', errors='ignore') as f:
    data = json.loads(f.read())
    followings = extract_followings(data)

with open(f'./data/{person}/profile.json', 'r') as f:
    data = json.loads(f.read())
    profile = extract_profile(data)

output = {
    'general': profile,
    'followings': followings
}
return output, profile['id']

def extract_all():
persons = os.listdir('./data')

users = {}
for person in persons:
    output, id = extract_person(person)
    users[id] = output

with open('./out/people.json', 'w') as f:
    f.write(json.dumps(users))

extract_all()

這是第 1067001 行第 26 列"url": "https://static.cdninstagram.com/rsrc.php/v3ikSh4/yq/l/en_US/ueR5Vvb4hPlbfkjYLCs4rGemD-jplRG0pz

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM