簡體   English   中英

當我嘗試提取特定值時,json 文件出現鍵錯誤

[英]Key Error with json file when I try to extract specific values

我想制作一個 dataframe ,將所有這些元素作為列,順序如下:

來自“播放列表”:“名稱”、“協作”、“pid”、“modified_at”、“num_tracks”、“num_albums”、“num_followers”、“num_edits”、“duration_ms”、“num_artists”

來自“曲目”:“pos”、“artist_name”、“track_uri”、“artist_uri”、“track_name”、“album_uri”、“duration_ms”、“album_name”

來自“信息”:“生成的_on”、“切片”、“版本”

json 文件的一部分如下:

{
    "info": {
        "generated_on": "2017-12-03 08:41:42.057563", 
        "slice": "0-999", 
        "version": "v1"
    }, 
    "playlists": [
        {
            "name": "Throwbacks", 
            "collaborative": "false", 
            "pid": 0, 
            "modified_at": 1493424000, 
            "num_tracks": 52, 
            "num_albums": 47, 
            "num_followers": 1, 
            "tracks": [
                {
                    "pos": 0, 
                    "artist_name": "Missy Elliott", 
                    "track_uri": "spotify:track:0UaMYEvWZi0ZqiDOoHU3YI", 
                    "artist_uri": "spotify:artist:2wIVse2owClT7go1WT98tk", 
                    "track_name": "Lose Control (feat. Ciara & Fat Man Scoop)", 
                    "album_uri": "spotify:album:6vV5UrXcfyQD1wu4Qo2I9K", 
                    "duration_ms": 226863, 
                    "album_name": "The Cookbook"
                }, 
                {
                    "pos": 1, 
                    "artist_name": "Britney Spears", 
                    "track_uri": "spotify:track:6I9VzXrHxO9rA9A5euc8Ak", 
                    "artist_uri": "spotify:artist:26dSoYclwsYLMAKD3tpOr4", 
                    "track_name": "Toxic", 
                    "album_uri": "spotify:album:0z7pVBGOD7HCIB7S8eLkLI", 
                    "duration_ms": 198800, 
                    "album_name": "In The Zone"
                }, 
 ], 
            "num_edits": 6, 
            "duration_ms": 11532414, 
            "num_artists": 37
        }, 

當我運行程序時,它給出了以下錯誤:

TypeError Traceback(最近一次調用最后)~\AppData\Local\Temp/ipykernel_3208/3949258436.py in 16 17 ---> 18 data= pd.json_normalize(js['playlists'], ['name', 'collaborative' , 'pid', 'modified_at', 'num_tracks', 'num_albums', 19 'num_followers', 'tracks', 'num_edits', 'num_artists'], js['info']) 20

~\anaconda3\lib\site-packages\pandas\io\json_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level) 293 294 meta_vals: DefaultDict = defaultdict(list) --> 295 meta_keys = [sep.join(val) for val in _meta] 296 297 def _recursive_extract(data, path, seen_meta, level=0):

~\anaconda3\lib\site-packages\pandas\io\json_normalize.py in (.0) 293 294 meta_vals: DefaultDict = defaultdict(list) --> 295 meta_keys = [sep.join(val) for val in _meta] 296 297 def _recursive_extract(數據,路徑,seen_meta,級別=0):

類型錯誤:序列項 0:預期的 str 實例,找到字典

這是我的代碼:

import json
import pandas as pd
import os



path = 'C:\\Users\\sotir\\Desktop\\machinedataset'

filenames = os.listdir(path)
for filename in sorted(filenames):
    if filename.startswith("mpd.slice.") and filename.endswith(".json"):
        fullpath = os.sep.join((path, filename))
        f = open(fullpath)
        js = json.load(f)
        f.close()


data= pd.json_normalize(js['playlists'],  ['name', 'collaborative', 'pid', 'modified_at', 'num_tracks', 'num_albums',
                                                    'num_followers', 'tracks', 'num_edits',  'num_artists'], js['info'])

為了解決您的直接錯誤,在您的pd.json.normalize()調用中,將最后一個參數更改為:

['info']

至:

js['info']

你會得到更多的錯誤,但這是一個新問題的素材。

修改后問題的答案:

根據https://pandas.pydata.org/pandas-docs/version/1.2.0/reference/api/pandas.json_normalize.htmlpd.json_normalize()的第三個參數是meta=None

查看該文檔頁面(您肯定需要查看),我們看到:

pandas.json_normalize
pandas.json_normalize(data, record_path=None, meta=None, meta_prefix=None, record_prefix=None, errors='raise', sep='.', max_level=None)[source]
Normalize semi-structured JSON data into a flat table.

Parameters:
data: dict or list of dicts
    Unserialized JSON objects.

record_path: str or list of str, default None
    Path in each object to list of records. If not passed, data will be assumed to be an array of records.

meta: list of paths (str or list of str), default None
    Fields to use as metadata for each record in resulting table.

根據文檔, meta是一個字符串或字符串列表。 您正在傳遞一個字典,這導致了錯誤。 您需要閱讀文檔以了解如何調用json_normalize()

我閱讀了文檔並將代碼更改為:

import json
import pandas as pd
import os



path = 'C:\\Users\\sotir\\Desktop\\machinedataset'

filenames = os.listdir(path)
for filename in sorted(filenames):
    if filename.startswith("mpd.slice.") and filename.endswith(".json"):
        fullpath = os.sep.join((path, filename))
        f = open(fullpath)
        js = json.load(f)
        f.close()


data= pd.json_normalize(js['playlists'], meta=['name', 'collaborative', 'pid', 'modified_at','num_tracks', 'num_albums', 
                                              'num_followers', 'num_edits', 'duration_ms', 'num_artists'],record_path= ['tracks'])
print(data)

問題是當我運行代碼時出現以下錯誤:

ValueError Traceback(最近一次調用最后)~\AppData\Local\Temp/ipykernel_3208/658546858.py in 16 17 ---> 18 data= pd.json_normalize(js['playlists'], meta=['name', '協作','pid','modified_at','num_tracks','num_albums',19'num_followers','num_edits','duration_ms','num_artists'],record_path= ['tracks'])20打印(數據)

~\anaconda3\lib\site-packages\pandas\io\json_normalize.py in _json_normalize(data, record_path, meta, meta_prefix, record_prefix, errors, sep, max_level) 347 348 if k in result: --> 349 raise ValueError( 350 f"元數據名{k}沖突,需要區分前綴" 351)

ValueError:元數據名稱duration_ms沖突,需要區分前綴

當我嘗試修復 duration_ms 添加 record_prefix='_' 時,我無法訪問“曲目”的字段。 你能幫我解決這個問題嗎?

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM