简体   繁体   English

在Python中解析嵌套的Json以删除列中的特殊字符

[英]Parse Nested Json in Python to Remove Special Characters in Columns

Here is my Json File 这是我的Json文件

{
    "highest_table": {
        "items": [{
                "key": "Human 1",
                "columns": {
                    "Na$me": "Tom",
                    "Description(ms/2)": "Table Number One on the Top",
                    "A&ge": "24",
                    "Ge_nder": "M"
                }
            },
            {
                "key": "Human 2",
                "columns": {
                    "Na$me": "John",
                    "Description(ms/2)": "Table Number One on the Top",
                    "A&ge": "23",
                    "Ge_nder": "M"
                }
                }
        ]
    }
}

The goal is to remove any and all special characters in the column names (or if easier any special character at all in the .json file), and return a .json file. 目标是删除列名中的所有特殊字符(或者,如果更容易的话,删除.json文件中的所有特殊字符),并返回一个.json文件。 My initial thoughts is to convert it to pandas, remove special characters in the column heading and convert it back to a .json file. 我最初的想法是将其转换为熊猫,删除列标题中的特殊字符,然后将其转换回.json文件。

This is what I have tried so far. 到目前为止,这是我尝试过的。 Both of them print a single line only. 它们都只打印一行。

import json
from pandas.io.json import json_normalize    

data_file = r"C:\characters.json"

with open(data_file) as data_file:    
    data = json.load(data_file)  

df = json_normalize(data)  

-- -

data_file = r"C:\characters.json"

df = pd.read_json(data_file)  

How can I extract the columns, remove special characters and put them back in a .json file ? 如何提取列,删除特殊字符并将其放回.json文件中?

A bit Q&D - you'll have to provide a complete implementation for fixkey but this should fix your problem. 一点问题与fixkey -您必须为fixkey提供完整的实现,但这应该可以解决您的问题。

import json

def fixkey(key):
    # toy implementation
    #print("fixing {}".format(key))
    return key.replace("&", "").replace("$", "")

def normalize(data):
    #print("normalizing {}".format(data))
    if isinstance(data, dict):
        data = {fixkey(key): normalize(value) for key, value in data.items()}
    elif isinstance(data, list):
        data = [normalize(item) for item in data]
    return data

jsdata = """
{
    "highest_table": {
        "items": [{
                "key": "Human 1",
                "columns": {
                    "Na$me": "Tom",
                    "Description(ms/2)": "Table Number One on the Top",
                    "A&ge": "24",
                    "Ge_nder": "M"
                }
            },
            {
                "key": "Human 2",
                "columns": {
                    "Na$me": "John",
                    "Description(ms/2)": "Table Number One on the Top",
                    "A&ge": "23",
                    "Ge_nder": "M"
                }
                }
        ]
    }
}
"""


data = json.loads(jsdata)

data = normalize(data)

result = json.dumps(data, indent=2)
print(result)

Frankly this is ugly but I haven't been able to find a more generic approach. 坦白地说,这很丑陋,但我还没有找到更通用的方法。 This is very specific to your particular JSON (the problem really needs solving in the API). 这非常特定于您的特定JSON(问题确实需要在API中解决)。

import json


response = """{
    "highest_table": {
        "items": [{
                "key": "Human 1",
                "columns": {
                    "Na$me": "Tom",
                    "Description(ms/2)": "Table Number One on the Top",
                    "A&ge": "24",
                    "Ge_nder": "M"
                }
            },
            {
                "key": "Human 2",
                "columns": {
                    "Na$me": "John",
                    "Description(ms/2)": "Table Number One on the Top",
                    "A&ge": "23",
                    "Ge_nder": "M"
                }
                }
        ]
    }
}"""

def fix_json(resp):

    output = {'highest_table': {'items': []}}
    for item in resp['highest_table']['items']:
        inner_dict = item['columns']
        fixed_values = {'Name': inner_dict['Na$me'],
                        'Description(ms/2)': inner_dict['Description(ms/2)'],
                        'Age': inner_dict['A&ge'],
                        'Gender': inner_dict['Ge_nder']
                        }
        new_inner = {'key': item['key'], 'columns': fixed_values}
        output['highest_table']['items'].append(new_inner)
    return output



response = json.loads(response)
fixed = fix_json(response)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM