简体   繁体   English

上传 json 文件以创建单个 pandas dataframe

[英]Uploading json files to create a single pandas dataframe

I have to upload from a pc folder lots of json files representing each one a row of the final dataframe I have to build.我必须从 pc 文件夹上传大量 json 文件,每个文件代表我必须构建的最终 dataframe 的一行。 Moreover, only 4 keys of the json files(nested dictionaries) have to appear in the dataframe.此外,json 文件(嵌套字典)中只有 4 个键必须出现在 dataframe 中。

Here I show you the working script that already does exactly what I need, but just for one json file:在这里,我向您展示了已经完全满足我需要的工作脚本,但仅适用于一个 json 文件:

with open('2020-03-02-10-43-08-9148.json') as inf:
    j = json.load(inf)
    log_cols = ["spotName", "CurrentCurve", "VoltageCurve"]
    data = [j['Message']['WeldLog'][col] for col in log_cols] + [j["TimeStamp"]]
    col_names = ["SpotName", "CurrentCurve", "VoltageCurve", "TimeStamp"]
    df = pd.DataFrame([data], columns=col_names)

Then I very naively tried to add some code in order to extend the script to all the json files in my folder, but it doesn't work.然后我很天真地尝试添加一些代码,以便将脚本扩展到我文件夹中的所有 json 文件,但它不起作用。 When I print the dataframe it seems to be composed just by one row(a single json file)!当我打印 dataframe 时,它似乎仅由一行组成(单个 json 文件)!

import pandas as pd
import json 
import io
import glob



for json_file in glob.glob("*.json"): #Assuming that json files and .py file are in the same directory
    
    with open(json_file) as inf:
        
        j = json.load(inf)
        log_cols = ["spotName", "CurrentCurve", "VoltageCurve"]
        data = [j['Message']['WeldLog'][col] for col in log_cols] + [j["TimeStamp"]]
        col_names = ["SpotName", "CurrentCurve", "VoltageCurve", "TimeStamp"] #Names of the df columns
        
          
df = pd.DataFrame([data], columns=col_names)
        

Lastly, if can be helpful, I leave you here an example of the json file structure:最后,如果有帮助的话,我给你留下一个 json 文件结构的例子:

{
    "Name": "WeldLog",
    "WeldTimer": "SCC005R01",
    "TimeStamp": "2019-11-07T12:29:01",
    "OutputFormat": "JSON",
    "Message": {
        "WeldLog": {
            "dateTime": "2019-10-23T18:30:31.8",
            "iActual1": 0.00,
            "iActual2": 7.98,
            "iActual3": 0.00,
            "partIdentString": "",
            "pha1": 0.00,
            "pha2": 32.24,
            "pha3": 0.00,
            "progNo": 49,
            "spotName": "60090_0_00",
            "timerName": "SCC005R01",
            "currentActualValue": 7.97,
            "currentFactor": 0,
            "currentReferenceValue": 7.82,
            "iDemand1": 3.00,
            "iDemand2": 7.80,
            "iDemand3": 3.00,
            "electrodeNo": 1,
            "iDemandStd": 9.00,
            "energyActualValue": 5159.189,
            "energyRefValue": 5301.22,
            "contactWaitTime": null,
            "monitorMode": 1,
            "monitorState": 0,
            "powerActualValue": 12514.94,
            "powerRefValue": 13288.9,
            "powerState": 0,
            "resistanceActualValue": 192,
            "resistanceRefValue": 209,
            "protRecord_ID": 670196.0,
            "uipActualValue": 86,
            "uipRefValue": 0,
            "uirExpulsionTime": 0,
            "voltageActualValue": 1.57,
            "voltageRefValue": 1.70,
            "wear": 1.00,
            "tipDressCounter": 26,
            "weldSpotCustDataP16_1": 0,
            "weldSpotCustDataP16_2": 0,
            "weldSpotCustDataP16_3": 0,
            "weldSpotCustDataP16_4": 0,
            "CurrentCurve": null,
            "VoltageCurve": [1, 2, 3, 4, 8],
            "ForceCurve": [6, 7, 8, 3, 6, 9, 6]
        }
    }
}

The reason is that data is override in every loop.原因是data在每个循环中都会被覆盖。 You need to store them globally.您需要在全球范围内存储它们。

data = []

for json_file in glob.glob("*.json"): #Assuming that json files and .py file are in the same directory
    with open(json_file) as inf:
        j = json.load(inf)
        log_cols = ["spotName", "CurrentCurve", "VoltageCurve"]
        data.append([j['Message']['WeldLog'][col] for col in log_cols] + [j["TimeStamp"]])

col_names = ["SpotName", "CurrentCurve", "VoltageCurve", "TimeStamp"] #Names of the df columns

df = pd.DataFrame(data, columns=col_names)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM