简体   繁体   中英

How to split a json into multiple files using part of the date

I have a big json file holding weather data. each set has timestamp. Now I want to split the json into multiple files using the month as filename: like 2021-10.json

The data look like this:

[
{
    'dt': 1633651200,
    'temp': 11.09,
    'feels_like': 10.66,
    'pressure': 1030,
    'humidity': 92,
    'dew_point': 9.84,
    'uvi': 0,
    'clouds': 98,
    'visibility': 10000,
    'wind_speed': 2.05,
    'wind_deg': 26,
    'wind_gust': 3.26,
    'weather': [
        {'id': 804, 'main': 'Clouds', 'description': 'overcast clouds', 'icon': '04n'}
    ]
},
{
    'dt': 1633654800,
    'temp': 10.27,
    'feels_like': 9.75,
    'pressure': 1030,
    'humidity': 92,
    'dew_point': 9.03,
    'uvi': 0,
    'clouds': 100,
    'visibility': 10000,
    'wind_speed': 2.32,
    'wind_deg': 54,
    'wind_gust': 4.73,
    'weather': [
        {'id': 804, 'main': 'Clouds', 'description': 'overcast clouds', 'icon': '04n'}
    ]
},...

First thing I did is to convert the timestamp into a date. My code so far looks like this:

with open('data.json','r', encoding='utf8') as f:
# Read the file and convert it to a dictionary
d = json.load(f)
x = d['hourly']
rprint(x)
for json_obj in x:
    timestamp= json_obj['dt']
    dt_obj = datetime.fromtimestamp(timestamp).strftime('%Y-%m-%d %H:%M:%S')
    json_obj['dt'] = dt_obj
    rprint(dt_obj)
    filename=str(json_obj['dt'])+'.json'
    with open(filename, 'w') as out_json_file:
        json.dump(json_obj, out_json_file, indent=4)

Has anybody an idea how I can put all the entries for one day into one json file.

Thank you in advance

You may consider creating a temporary container to reorganize & hold related month data together like this:

from collections import defaultdict

# ...open and parse json file

month_data = defaultdict(list)

for json_obj in hourly_data:
    timestamp = json_obj["dt"]
    # more useful to keep the datetime object as an object, not a string yet
    dt_obj = datetime.fromtimestamp(timestamp)

    json_obj["dt"] = dt_obj.strftime('%Y-%m-%d %H:%M:%S')
    # if you haven't used `defaultdict` before, it allows skipping some
    # boilerplate code when creating dict entries that may not exist
    month_data[dt_obj.strftime("%Y-%m")].append(json_obj)

# month: '2021-10' (key), json_data: list of hourly/day data (value)
for month, json_data in month_data.items():
    with open(f"{month}.json", "w") as json_outfile:
        json.dump(json_data, json_outfile, indent=4)

*Edit:

I see you asked for each day to be a separate file (your code was seeming to do it by month)... I think you can extrapolate my example to work for one JSON file per day. Let me know if you figure it out!

Python docs for defaultdict

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM