简体   繁体   English

对熊猫数据框的深度嵌套JSON响应

[英]Deeply nested JSON response to pandas dataframe

I am new to python/pandas and I am having some issues with converting a nested JSON to pandas dataframe. 我是python / pandas的新手,将嵌套的JSON转换为pandas数据框时遇到一些问题。 I am sending a query to a database and getting a JSON string back. 我正在将查询发送到数据库并获取JSON字符串。

It's a deeply nested JSON string that contains several arrays. 这是一个深度嵌套的JSON字符串,其中包含多个数组。 The response from the database contains thousands of rows. 来自数据库的响应包含数千行。 Here is the general structure of one row in the JSON string: 这是JSON字符串中一行的一般结构:

{
  "ID": "123456",
  "profile": {
    "criteria": [
      {
        "type": "type1",
        "name": "name1",
        "value": "7",
        "properties": []
      },
      {
        "type": "type2",
        "name": "name2",
        "value": "6",
        "properties": [
          {
            "type": "MAX",
            "name": "",
            "value": "100"
          },
          {
            "type": "MIN",
            "name": "",
            "value": "5"
          }
        ]
      },
      {
        "type": "type3",
        "name": "name3",
        "value": "5",
        "properties": []
      }
    ]
  }
}  
{
  "ID": "456789",
  "profile": {
    "criteria": [
      {
        "type": "type4",
        "name": "name4",
        "value": "6",
        "properties": []
      }
    ]
  }
}

I want to flatten this JSON string using python pandas. 我想使用python pandas扁平化此JSON字符串。 I had problems using json_normalize since this is a deeply nested JSON string: 我使用json_normalize时遇到问题,因为这是一个深度嵌套的JSON字符串:

from cassandra.cluster import Cluster
import pandas as pd
from pandas.io.json import json_normalize

def pandas_factory(colnames, rows):
    return pd.DataFrame(rows, columns=colnames)

cluster = Cluster(['xxx.xx.x.xx'], port=yyyy)
session = cluster.connect('nnnn')

session.row_factory = pandas_factory

json_string = session.execute('select json ......')
df = json_string ._current_rows
df_normalized= json_normalize(df)
print(df_normalized)

When i run this code, i get a Key error: 当我运行此代码时,我得到一个关键错误:

KeyError: 0

I need help converting this JSON string to a dataframe with only some selected columns that looks something like this: (The rest of the data can be skipped) 我需要帮助将此JSON字符串转换为只有某些选定列的数据框,看起来像这样:(其余数据可以跳过)

ID        |   criteria   | type   |   name   |   value   |

123456          1          type1      name1        7      
123456          2          type2      name2        6  
123456          3          type3      name3        5    
456789          1          type4      name4        6

I tried to find similar problems on here but I can't seem to apply it to my JSON string. 我试图在这里找到类似的问题,但似乎无法将其应用于我的JSON字符串。

Any help is appreciated! 任何帮助表示赞赏! :) :)

EDIT: 编辑:

The json string that is retured is a query response object: ResultSet . 记录的json字符串是查询响应对象:ResultSet I think thats why I'm having some issues with using: 我认为这就是为什么我在使用时遇到一些问题:

json_string= session.execute('select json profile from visning')
temp = json.loads(json_string)

and getting the error: 并得到错误:

TypeError: the JSON object must be str, not 'ResultSet'

EDIT #2: 编辑#2:

Just to see what I'm working with, I printed the the result query by using: 只是为了查看我正在使用什么,我使用以下命令打印了结果查询:

for line in session.execute('select json.....'):
    print(line)

and got something like this: 并得到了这样的东西:

Row(json='{"ID": null, "profile": null}')
Row(json='{"ID": "123", "profile": {"criteria": [{"type": "type1", "name": "name1", "value": "10", "properties": []}, {"type": "type2", "name": "name2", "value": "50", "properties": []}, {"type": "type3", "name": "name3", "value": "40", "properties": []}]}}')
Row(json='{"ID": "456", "profile": {"criteria": []}}')
Row(json='{"ID": "789", "profile": {"criteria": [{"type": "type4", "name": "name4", "value": "5", "properties": []}]}}')
Row(json='{"ID": "987", "profile": {"criteria": [{"type": "type5", "name": "name5", "value": "70", "properties": []}, {"type": "type6", "name": "name6", "value": "60", "properties": []}, {"type": "type7", "name": "name7", "value": "2", "properties": []}, {"type": "type8", "name": "name8", "value": "7", "properties": []}]}}')

The issue I'm having is converting this structure to a json string that can be used in json.loads(): 我遇到的问题是将此结构转换为可在json.loads()中使用的json字符串:

json_string= session.execute('select json profile from visning')
json_list = list(json_string)
string= ''.join(list(map(str, json_list)))
temp = json.loads(string) <-- creates error json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

EDIT #3: 编辑#3:

As requested below in the comments, printing 根据下面注释中的要求,打印

for line in session.execute('select json.....'):
print((line.json))

gets the output: 获取输出:

{"ID": null, "profile": null}
{"ID": "123", "profile": {"criteria": [{"type": "type1", "name": "name1", "value": "10", "properties": []}, {"type": "type2", "name": "name2", "value": "50", "properties": []}, {"type": "type3", "name": "name3", "value": "40", "properties": []}]}}
{"ID": "456", "profile": {"criteria": []}}
{"ID": "789", "profile": {"criteria": [{"type": "type4", "name": "name4", "value": "5", "properties": []}]}}
{"ID": "987", "profile": {"criteria": [{"type": "type5", "name": "name5", "value": "70", "properties": []}, {"type": "type6", "name": "name6", "value": "60", "properties": []}, {"type": "type7", "name": "name7", "value": "2", "properties": []}, {"type": "type8", "name": "name8", "value": "7", "properties": []}]}}

A hardcoded example... 一个硬编码的例子...

import pandas as pd

temp = [{
  "ID": "123456",
  "profile": {
    "criteria": [
      {
        "type": "type1",
        "name": "name1",
        "value": "7",
        "properties": []
      },
      {
        "type": "type2",
        "name": "name2",
        "value": "6",
        "properties": [
          {
            "type": "MAX",
            "name": "",
            "value": "100"
          },
          {
            "type": "MIN",
            "name": "",
            "value": "5"
          }
        ]
      },
      {
        "type": "type3",
        "name": "name3",
        "value": "5",
        "properties": []
      }
    ]
  }
},
{
  "ID": "456789",
  "profile": {
    "criteria": [
      {
        "type": "type4",
        "name": "name4",
        "value": "6",
        "properties": []
      }
    ]
  }
}]

cols = ['ID', 'criteria', 'type', 'name', 'value']

rows = []
for data in temp:
    data_id = data['ID']
    criteria = data['profile']['criteria']
    for d in criteria:
        rows.append([data_id, criteria.index(d)+1, *list(d.values())[:-1]])

df = pd.DataFrame(rows, columns=cols)

This is by no means elegant. 这绝不是优雅。 It is more of a quick and dirty solution, as I don't know how the JSON data is exactly formatted - however, based on what you've provided, my code above will produce the desired DataFrame. 这更像是一种快速而肮脏的解决方案,因为我不知道JSON数据的确切格式-但是,根据您提供的内容,我上面的代码将生成所需的DataFrame。

       ID  criteria   type   name value
0  123456         1  type1  name1     7
1  123456         2  type2  name2     6
2  123456         3  type3  name3     5
3  456789         1  type4  name4     6

Additionally, if you need to 'load' the JSON data, you can use the json library like so: 此外,如果您需要“加载” JSON数据,则可以使用json库,如下所示:

import json

temp = json.loads(json_string)

# Or from a file...
with open('some_json.json') as json_file:
    temp = json.load(json_file)

Please note the difference between json.loads and json.load . 请注意json.loadsjson.load之间的区别。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM