[英]Parse Nested Json in Python to Remove Special Characters in Columns
Here is my Json File 这是我的Json文件
{
"highest_table": {
"items": [{
"key": "Human 1",
"columns": {
"Na$me": "Tom",
"Description(ms/2)": "Table Number One on the Top",
"A&ge": "24",
"Ge_nder": "M"
}
},
{
"key": "Human 2",
"columns": {
"Na$me": "John",
"Description(ms/2)": "Table Number One on the Top",
"A&ge": "23",
"Ge_nder": "M"
}
}
]
}
}
The goal is to remove any and all special characters in the column names (or if easier any special character at all in the .json file), and return a .json file. 目标是删除列名中的所有特殊字符(或者,如果更容易的话,删除.json文件中的所有特殊字符),并返回一个.json文件。 My initial thoughts is to convert it to pandas, remove special characters in the column heading and convert it back to a .json file.
我最初的想法是将其转换为熊猫,删除列标题中的特殊字符,然后将其转换回.json文件。
This is what I have tried so far. 到目前为止,这是我尝试过的。 Both of them print a single line only.
它们都只打印一行。
import json
from pandas.io.json import json_normalize
data_file = r"C:\characters.json"
with open(data_file) as data_file:
data = json.load(data_file)
df = json_normalize(data)
-- -
data_file = r"C:\characters.json"
df = pd.read_json(data_file)
How can I extract the columns, remove special characters and put them back in a .json file ? 如何提取列,删除特殊字符并将其放回.json文件中?
A bit Q&D - you'll have to provide a complete implementation for fixkey
but this should fix your problem. 一点问题与
fixkey
-您必须为fixkey
提供完整的实现,但这应该可以解决您的问题。
import json
def fixkey(key):
# toy implementation
#print("fixing {}".format(key))
return key.replace("&", "").replace("$", "")
def normalize(data):
#print("normalizing {}".format(data))
if isinstance(data, dict):
data = {fixkey(key): normalize(value) for key, value in data.items()}
elif isinstance(data, list):
data = [normalize(item) for item in data]
return data
jsdata = """
{
"highest_table": {
"items": [{
"key": "Human 1",
"columns": {
"Na$me": "Tom",
"Description(ms/2)": "Table Number One on the Top",
"A&ge": "24",
"Ge_nder": "M"
}
},
{
"key": "Human 2",
"columns": {
"Na$me": "John",
"Description(ms/2)": "Table Number One on the Top",
"A&ge": "23",
"Ge_nder": "M"
}
}
]
}
}
"""
data = json.loads(jsdata)
data = normalize(data)
result = json.dumps(data, indent=2)
print(result)
Frankly this is ugly but I haven't been able to find a more generic approach. 坦白地说,这很丑陋,但我还没有找到更通用的方法。 This is very specific to your particular JSON (the problem really needs solving in the API).
这非常特定于您的特定JSON(问题确实需要在API中解决)。
import json
response = """{
"highest_table": {
"items": [{
"key": "Human 1",
"columns": {
"Na$me": "Tom",
"Description(ms/2)": "Table Number One on the Top",
"A&ge": "24",
"Ge_nder": "M"
}
},
{
"key": "Human 2",
"columns": {
"Na$me": "John",
"Description(ms/2)": "Table Number One on the Top",
"A&ge": "23",
"Ge_nder": "M"
}
}
]
}
}"""
def fix_json(resp):
output = {'highest_table': {'items': []}}
for item in resp['highest_table']['items']:
inner_dict = item['columns']
fixed_values = {'Name': inner_dict['Na$me'],
'Description(ms/2)': inner_dict['Description(ms/2)'],
'Age': inner_dict['A&ge'],
'Gender': inner_dict['Ge_nder']
}
new_inner = {'key': item['key'], 'columns': fixed_values}
output['highest_table']['items'].append(new_inner)
return output
response = json.loads(response)
fixed = fix_json(response)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.