简体   繁体   English

嵌套 JSON 和 Pandas v2

[英]Nested JSON and Pandas v2

I asked a question yesterday about how to turn a JSON file into a dataframe, but I was asking the wrong question我昨天问了一个关于如何将 JSON 文件转换为数据框的问题,但我问错了问题

Nested JSON and Pandas 嵌套的 JSON 和 Pandas

I have a JSON file that looks like the below我有一个如下所示的 JSON 文件

There are two levels of keys (some times repeating and other times not)有两级键(有时重复,有时不重复)

{
"Abaddon the Despoiler": {
    "Abaddon the Despoiler": {
        "model_count": "1",
        "points_value": "220\u2022",
        "movement": "6\"",
        "weapon_skill": "2+",
        "ballistic_skill": "2+",
        "strength": "5",
        "toughness": "5",
        "wounds": "8",
        "attacks": "6",
        "leadership": "10",
        "save": "2+"
    }
},
"Chaos Space Marines": {
    "Chaos Space Marine": {
        "model_count": "4-19",
        "points_value": "14",
        "movement": "6\"",
        "weapon_skill": "3+",
        "ballistic_skill": "3+",
        "strength": "4",
        "toughness": "4",
        "wounds": "1",
        "attacks": "1",
        "leadership": "7",
        "save": "3+"
    },
    "Aspiring Champion": {
        "model_count": "1",
        "points_value": "14",
        "movement": "6\"",
        "weapon_skill": "3+",
        "ballistic_skill": "3+",
        "strength": "4",
        "toughness": "4",
        "wounds": "1",
        "attacks": "2",
        "leadership": "8",
        "save": "3+"
    }
}
}

I would like to convert this to a data frame that looks like the below:我想将其转换为如下所示的数据框:

unit单元 model模型 model_count模型计数 points_value点值 movement移动 weapon_skill武器技能 ballistic_skill弹道技能 strength力量 toughness韧性 wounds伤口 attacks攻击 leadership领导 save节省
Abaddon the Despoiler掠夺者亚巴顿 Abaddon the Despoiler掠夺者亚巴顿 1 1 220\• 220\• 6" 6" 2+ 2+ 2+ 2+ 5 5 5 5 8 8 6 6 10 10 +2 +2
Chaos Space Marines混沌星际战士 Chaos Space Marines混沌星际战士 4-19 4-19 14 14 6" 6" 3+ 3+ 3+ 3+ 4 4 4 4 1 1 1 1 7 7 +3 +3
Chaos Space Marines混沌星际战士 Aspiring Champion有抱负的冠军 1 1 14 14 6" 6" 3+ 3+ 3+ 3+ 4 4 4 4 1 1 2 2 8 8 +3 +3

@azro provided this useful answer to my question yesterday, but I asked the wrong question. @azro 昨天为我的问题提供了这个有用的答案,但我问错了问题。 In the original question, I wanted to skip the second level of keys so it looked like the below在最初的问题中,我想跳过第二级键,所以它看起来像下面这样

unit单元 model_count模型计数 points_value点值 movement移动 weapon_skill武器技能 ballistic_skill弹道技能 strength力量 toughness韧性 wounds伤口 attacks攻击 leadership领导 save节省
Abaddon the Despoiler掠夺者亚巴顿 1 1 220\• 220\• 6" 6" 2+ 2+ 2+ 2+ 5 5 5 5 8 8 6 6 10 10 +2 +2
Chaos Lord混沌领主 1 1 80 80 6" 6" 2+ 2+ 2+ 2+ 4 4 4 4 5 5 4 4 9 9 +3 +3
d = {'Abaddon the Despoiler': {'Abaddon the Despoiler': {'model_count': '1', 'points_value': '220•', 'movement': '6"', 'weapon_skill': '2+', 'ballistic_skill': '2+', 'strength': '5', 'toughness': '5', 'wounds': '8', 'attacks': '6', 'leadership': '10', 'save': '2+'}}, 
     'Chaos Lord':            {'Chaos Lord':            {'model_count': '1', 'points_value': '80','movement': '6"', 'weapon_skill': '2+', 'ballistic_skill': '2+', 'strength': '4', 'toughness': '4', 'wounds': '5', 'attacks': '4', 'leadership': '9', 'save': '3+'}}}

data = [{'unit': key, **values[key]} for key, values in d.items()]
nycphil = pd.DataFrame(data)

Use nested list comprehension with append dict to values of nested ditionaries and pass to DataFrame constructor:使用嵌套列表理解,将 dict 附加到嵌套字典的值并传递给DataFrame构造函数:

L = [{**{'unit': k, 'model': k1}, **v1} for k, v in d.items() for k1, v1 in v.items()]

df = pd.DataFrame(L)
print (df)
                    unit                  model model_count points_value  \
0  Abaddon the Despoiler  Abaddon the Despoiler           1         220•   
1    Chaos Space Marines     Chaos Space Marine        4-19           14   
2    Chaos Space Marines      Aspiring Champion           1           14   

  movement weapon_skill ballistic_skill strength toughness wounds attacks  \
0       6"           2+              2+        5         5      8       6   
1       6"           3+              3+        4         4      1       1   
2       6"           3+              3+        4         4      1       2   

  leadership save  
0         10   2+  
1          7   3+  
2          8   3+  

EDIT: After some tests there was some nested values with error , you can omit them and then ouput is:编辑:经过一些测试后,有一些带有error嵌套值,您可以省略它们,然后输出为:

with open('chaos-space-marines.json') as f:
    d = json.load(f)
    

L = []
for k, v in d.items():
    if isinstance(v, dict):
        for k1, v1 in v.items():
            if isinstance(v1, dict):
                L.append({**{'unit': k, 'model': k1}, **v1})

df = pd.DataFrame(L)
print (df)
                                   unit                               model  \
0                 Abaddon the Despoiler               Abaddon the Despoiler   
1                            Chaos Lord                          Chaos Lord   
2       Chaos Lord in Terminator Armour     Chaos Lord in Terminator Armour   
3                                Cypher                              Cypher   
4                         Daemon Prince                       Daemon Prince   
..                                  ...                                 ...   
122     Hellforged Spartan Assault Tank     Hellforged Spartan Assault Tank   
123  Hellforged Typhon Heavy Siege Tank  Hellforged Typhon Heavy Siege Tank   
124                       Kytan Ravager                       Kytan Ravager   
125                       Chaos Bastion                       Chaos Bastion   
126                     Noctilith Crown                     Noctilith Crown   

    model_count points_value movement weapon_skill ballistic_skill strength  \
0             1         220•       6"           2+              2+        5   
1             1           80       6"           2+              2+        4   
2             1           95       5"           2+              2+        4   
3             1          85•       7"           2+              2+        4   
4             1          150       8"           2+              2+        7   
..          ...          ...      ...          ...             ...      ...   
122           1          320        *            *               *        8   
123           1          720        *            *               *        8   
124           1          430        *           3+              3+        *   
125           1          150        -            -              5+        -   
126           1           85        -            -              4+        -   

    toughness wounds attacks leadership save  
0           5      8       6         10   2+  
1           4      5       4          9   3+  
2           4      6       4          9   2+  
3           4      5       4          9   3+  
4           6      8       4         10   3+  
..        ...    ...     ...        ...  ...  
122         8     20       4          9   2+  
123         9     22       7          9   2+  
124         8     22       *          9   3+  
125        10     20       0          6   4+  
126         8     14       -          -   3+  

[127 rows x 13 columns]

EDIT:编辑:

You can also check problematic values (why original solution failed) by else statements:您还可以通过else语句检查有问题的值(为什么原始解决方案失败):

with open('chaos-space-marines.json') as f:
    d = json.load(f)

L = []
for k, v in d.items():
    if isinstance(v, dict):
        for k1, v1 in v.items():
            if isinstance(v1, dict):
                L.append({**{'unit': k, 'model': k1}, **v1})
            else:
                print ('inner loop')
                print (v1)
    else:
        print ('outer loop')
        print (v)

df = pd.DataFrame(L)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM