简体   繁体   中英

Easiest way to store data from txt file?

I have a txt file with data from various series. It looks like this:

2017.07.16
Games
7x01
60
1
2017.07.23
Games
7x02
60
1
2017.07.30
Games
7x03
60
1
...

Every 5 data is for 1 episode of one series (there is multiple) So it looks like this for the first five:

Date of air: 2017.07.16
Title: Games
SeasonXepisode: 7x01
Length: 60
Seen it or not: 1

I want to store all of this data in a dictionary, but couldn't figure out a way to do so.

One of the ways I tried and didn't work:

series = []
with open("lista.txt", "rt", encoding="utf-8") as file:
    lines = file.readlines()

count = 0

while count < len(lines):
    serie = {"air_date": lines[count],
             "title": lines[count + 1],
             "season_episode": lines[count + 2],
             "length": lines[count + 3],
             "seen": lines[count + 4]}
    series.append(serie)
    count += 1

this also doesn't work:

while count < len(lines):
    count += 1
    air_date = lines[count]
    title = lines[count + 1]
    season_episode = lines[count + 2]
    length = lines[count + 3]
    seen = lines[count + 4]
    series.append([air_date, title, season_episode, length, seen])

this also doesn't work:

lg = len(lines)
    for line in range(lg):
        serie = {"air_date": lines[0],
                   "title": lines[1],
                   "season_episode": lines[3],
                   "length": lines[4]}
        series.append(serie)

tried the Java way as well:

air_date = []
title = []
season_episode = []
length = []
seen = []

line = ""
count = 0

for lines in file:
    count += 1
    air_date[count - 1] = file.readline()
    title[count - 1] = file.readline()
    season_episode[count - 1] = file.readline()
    length[count - 1] = file.readline()
    seen[count - 1] = file.readline()

I'm very new to Python, and after Java and JavaScript it's quite strange why iterating doesn't work. Any ideas?

If the sample below is your data in a file, say data.txt .

2017.07.16
Games
7x01
60
1
2017.07.23
Games
7x02
60
1
2017.07.30
Games
7x03
60
1

Then you might want to try this:

import json

with open("data.txt") as f:
    data = [l.strip() for l in f.readlines()]
    chunked = [data[i:i+5] for i in range(0, len(data), 5)]
    your_list_of_dicts = [
        {
            "air_date": i[0],
            "title": i[1],
            "season_episode": i[2],
            "length": i[3],
            "seen": i[4],
        } for i in chunked
    ]

    print(json.dumps(your_list_of_dicts, indent=2))

Output:

[
  {
    "air_date": "2017.07.16",
    "title": "Games",
    "season_episode": "7x01",
    "length": "60",
    "seen": "1"
  },
  {
    "air_date": "2017.07.23",
    "title": "Games",
    "season_episode": "7x02",
    "length": "60",
    "seen": "1"
  },
  {
    "air_date": "2017.07.30",
    "title": "Games",
    "season_episode": "7x03",
    "length": "60",
    "seen": "1"
  }
]

Make a generator function that chunks an iterator in n elements, using itertools.islice :

from itertools import islice

def chunk(iterable, n):
    iterable = iter(iterable)
    ch = list(islice(iterable, 0, n))
    while ch:
        yield ch
        ch = list(islice(iterable, 0, n))
        
keys = ["Date of air", "Title", "SeasonXepisode", "Length", "Seen it or not"]

with open('input.txt') as fh:
    database = {}
    for ch in chunk(fh, 5):
        ch = map(str.strip, ch)
        epi = dict(zip(keys, ch))
        database[f"{epi['Title']}_{epi['SeasonXepisode']}"] = epi

print(database)

Output:

{'Games_7x01': {'Date of air': '2017.07.16',
                'Length': '60',
                'SeasonXepisode': '7x01',
                'Seen it or not': '1',
                'Title': 'Games'},
 'Games_7x02': {'Date of air': '2017.07.23',
                'Length': '60',
                'SeasonXepisode': '7x02',
                'Seen it or not': '1',
                'Title': 'Games'},
 'Games_7x03': {'Date of air': '2017.07.30',
                'Length': '60',
                'SeasonXepisode': '7x03',
                'Seen it or not': '1',
                'Title': 'Games'}}

You could go a little further with collections.defaultdict , but as your main problem is accessing 5 successive elements, it is your choice how to format the dict,

from collections import defaultdict

database = defaultdict(dict)

with open('input.txt') as fh:
    for ch in chunk(fh, 5):
        ch = map(str.strip, ch)
        epi = dict(zip(keys, ch))
        database[f"{epi['Title']}"][f"{epi['SeasonXepisode']}"] = epi
print(database)

Output:

defaultdict(<class 'dict'>,
            {'Games': {'7x01': {'Date of air': '2017.07.16',
                                'Length': '60',
                                'SeasonXepisode': '7x01',
                                'Seen it or not': '1',
                                'Title': 'Games'},
                       '7x02': {'Date of air': '2017.07.23',
                                'Length': '60',
                                'SeasonXepisode': '7x02',
                                'Seen it or not': '1',
                                'Title': 'Games'},
                       '7x03': {'Date of air': '2017.07.30',
                                'Length': '60',
                                'SeasonXepisode': '7x03',
                                'Seen it or not': '1',
                                'Title': 'Games'}}})
with open("lista.txt", "r") as f:
    data = [elem.strip() for elem in f.readlines()] # delete '\n' and strip white spaces in each element of the list


data_json = []

for i, elem in enumerate(data):
    if i % 5 == 0:
        data_json.append(
             {
                "air_date":data[i],
                "title":data[i+1],
                "season_episode":data[i+2],
                "length":data[i+3],
                "seen":data[i+4]
             }
        )

print(data_json)

Output:

[{'air_date': '2017.07.16', 'title': 'Games', 'season_episode': '7x01', 'length': '60', 'seen': '1'}, {'air_date': '2017.07.23', 'title': 'Games', 'season_episode': '7x02', 'length': '60', 'seen': '1'}, {'air_date': '2017.07.30', 'title': 'Games', 'season_episode': '7x03', 'length': '60', 'seen': '1'}]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM