简体   繁体   English

将嵌套的 JSON 解析为一个数据文件

[英]Parsing Nested JSON to one data file

I am trying to parse a nested json.我正在尝试解析嵌套的 json。

I've got the dataset stored here so that you can see what I'm seeing specifically if you want: https://mega.nz/file/YWNSRBjK#V9DpoY5LSp-VL8Mnu7NEfNf3FhDOCj9FHBiTQ4KHEa8我已将数据集存储在这里,以便您可以根据需要查看我所看到的具体内容: https://mega.nz/file/YWNSRBjK#V9DpoY5LSp-VL8Mnu7NEfNf3FhDOCj9FHBiTQ4KHEa8

I am attempting to parse this using pandas json_normalize function.我正在尝试使用 pandas json_normalize function 来解析这个。 Below is what my code looks like in it's entirety.下面是我的代码的完整外观。

import gzip   
import shutil
import json
import pandas as pd

with gzip.open('testjson.json.gz', 'rb') as f_in:
    with open('unzipped_json.json', 'wb') as f_out:
        shutil.copyfileobj(f_in, f_out)

f = open('unzipped_json.json')
data = json.load(f)
keys = data.keys()
keys_string = list(keys)
 
### In Network
in_network_df = pd.json_normalize(data['in_network'])

### Negotiated Rates
negotiated_rates_df = pd.json_normalize(data=data['in_network'],
                                        record_path=("negotiated_rates"))
negotiated_rates_df = negotiated_rates_df.explode('provider_references')
negotiated_rates_df = negotiated_rates_df.explode('negotiated_prices')

### Negotiated Prices
negotiated_prices_df = pd.json_normalize(data=data['in_network'],
                                         meta=[
                                             #['negotiated_rates','provider_references'],
                                            # ['negotiation_arrangement', 'name','billing_code_type','billing_code','description']
                                             ],
                                        record_path=['negotiated_rates','negotiated_prices'],
                                        errors='ignore')
negotiated_prices_df = negotiated_prices_df.explode('service_code')

### Provider References
provider_references_df = pd.json_normalize(data['provider_references'])
provider_references_test = provider_references_df.explode('provider_groups')

### Provider Groups
provider_groups = pd.json_normalize(data=data['provider_references'],
                                    meta=['provider_group_id'],
                                        record_path=("provider_groups"))
provider_groups = provider_groups.explode('npi')

I am specifically having trouble with the negotiated prices part of this json object.我对这个 json object 的协商价格部分特别有问题。 I am trying to add in some data from parent objects, but it is giving me an error.我正在尝试添加来自父对象的一些数据,但它给了我一个错误。 To point out specifically what I would like to do here it is below.为了具体指出我想在这里做的事情,如下所示。

negotiated_prices_df = pd.json_normalize(data=data['in_network'],
                                         meta=['provider_references'],
                                        record_path=['negotiated_rates','negotiated_prices'],
                                        errors='ignore')

When I try to do this I get ValueError: operands could not be broadcast together with shape (74607,) (24869,)当我尝试这样做时,我得到 ValueError: operands could not be broadcast together with shape (74607,) (24869,)

Can anyone help me understand what is going on here?谁能帮我理解这里发生了什么?

Edit: Trying to provide some more context in case someone is not wanting to open my file... Here is one spot showing the problematic portion I'm dealing with in the JSON.编辑:尝试提供更多上下文以防有人不想打开我的文件...这里有一个地方显示了我在 JSON 中处理的有问题的部分。 I can't seem to get the provider_references to attach to any of the child objects.我似乎无法让 provider_references 附加到任何子对象。

"provider_references":[261, 398, 799],"negotiated_prices":[{"negotiated_type": "fee schedule","negotiated_rate": 296.00,"expiration_date": "2023-06-30","service_code": ["01", "02", "03", "04", "05", "06", "07", "08", "09", "10", "11", "12", "13", "provider_references":[261, 398, 799],"negotiated_prices":[{"negotiated_type": "费用表","negotiated_rate": 296.00,"expiration_date": "2023-06-30","service_code": [ “01”、“02”、“03”、“04”、“05”、“06”、“07”、“08”、“09”、“10”、“11”、“12”、“13” ",

I think the code that you want looks like this:我认为您想要的代码如下所示:

with open('unzipped_json.json') as f:
    data = json.load(f)

negotiated_rates_and_prices_df = pd.json_normalize(
    data["in_network"],
    record_path=["negotiated_rates", ["negotiated_prices"]],
    meta=[
        "negotiation_arrangement",
        "name",
        "billing_code_type",
        "billing_code_type_version",
        "billing_code",
        "description",
        ["negotiated_rates", "provider_references"],
    ],
)

That takes care of the in_network part of the JSON.这负责 JSON 的in_network部分。 The trick is that within the metadata path you want to put the columns which are not nested in a regular list, and the nested ones in the order of nesting (ie ["negotiated_rates", "provider_references"] ).诀窍是在元数据路径中,您希望将未嵌套在常规列表中的列以及嵌套的列按嵌套顺序放置(即["negotiated_rates", "provider_references"] )。 There's a similar example in the docs here . 这里的文档中有一个类似的例子。

Then for the other nested part of the JSON you can do this:然后对于 JSON 的其他嵌套部分,您可以这样做:

provider_references_df = pd.json_normalize(
    data["provider_references"], "provider_groups", "provider_group_id"
)

And that takes care of the whole thing.这照顾了整个事情。

Is this what you are trying to achieve?这是你想要达到的目标吗?

import pandas as pd
import json

with open("testjson.json", "r") as f:
    data = json.load(f)
for k, v in data.items():
    print(k)
negotiated_prices_df = pd.json_normalize(data['in_network'], record_path=['negotiated_rates', ['negotiated_prices']], meta = ['negotiation_arrangement','name','billing_code_type','billing_code_type_version', 'billing_code', 'description', ['negotiated_rates', 'provider_references']], errors='ignore').explode('service_code', ignore_index=True)

print(negotiated_prices_df)

Result printed in terminal:终端打印的结果:

    negotiated_type negotiated_rate expiration_date service_code    billing_class   negotiation_arrangement name    billing_code_type   billing_code_type_version   billing_code    description negotiated_rates.provider_references
0   fee schedule    296.00  2023-06-30  01  professional    ffs nasal/s CPT 2022    31240   Nasal/sinus endoscopy surg  [261, 398, 799]
1   fee schedule    296.00  2023-06-30  02  professional    ffs nasal/s CPT 2022    31240   Nasal/sinus endoscopy surg  [261, 398, 799]
2   fee schedule    296.00  2023-06-30  03  professional    ffs nasal/s CPT 2022    31240   Nasal/sinus endoscopy surg  [261, 398, 799]
3   fee schedule    296.00  2023-06-30  04  professional    ffs nasal/s CPT 2022    31240   Nasal/sinus endoscopy surg  [261, 398, 799]
4   fee schedule    296.00  2023-06-30  05  professional    ffs nasal/s CPT 2022    31240   Nasal/sinus endoscopy surg  [261, 398, 799]
... ... ... ... ... ... ... ... ... ... ... ... ...
687789  negotiated  15461.36    2023-06-30  NaN institutional   ffs CARDIAC CATHETERIZATION FOR OTHER NON-CORONARY CONDITIONS   APR-DRG 39.1    192 CARDIAC CATHETERIZATION FOR OTHER NON-CORONARY CONDITIONS   [191]
687790  negotiated  11953.00    2023-06-30  NaN institutional   ffs CARDIAC CATHETERIZATION FOR OTHER NON-CORONARY CONDITIONS   APR-DRG 39.1    192 CARDIAC CATHETERIZATION FOR OTHER NON-CORONARY CONDITIONS   [521, 688]
687791  negotiated  12622.15    2023-06-30  NaN institutional   ffs CARDIAC CATHETERIZATION FOR OTHER NON-CORONARY CONDITIONS   APR-DRG 39.1    192 CARDIAC CATHETERIZATION FOR OTHER NON-CORONARY CONDITIONS   [1003, 1045, 11, 1174, 133, 149, 177, 186, 251, 27, 564, 649, 683, 697, 705, 764, 827, 836, 837, 87, 937, 938]
687792  negotiated  11864.00    2023-06-30  NaN institutional   ffs CARDIAC CATHETERIZATION FOR OTHER NON-CORONARY CONDITIONS   APR-DRG 39.1    192 CARDIAC CATHETERIZATION FOR OTHER NON-CORONARY CONDITIONS   [1176, 319, 974]
687793  negotiated  11229.02    2023-06-30  NaN institutional   ffs CARDIAC CATHETERIZATION FOR OTHER NON-CORONARY CONDITIONS   APR-DRG 39.1    192 CARDIAC CATHETERIZATION FOR OTHER NON-CORONARY CONDITIONS   [371, 523]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM