简体   繁体   中英

Extract data from lines of a text file

I need to extract data from lines of a text file. The data is name and scoring information formatted like this:

Shyvana - 12/4/5 - Loss - 2012-11-22
Fizz - 12/4/5 - Win - 2012-11-22
Miss Fortune - 12/4/3 - Win - 2012-11-22

This file is generated by another part of my little python program where I ask the user for the name, lookup the name they enter to ensure it's valid from a list of names, and then ask for kills, deaths, assists, and whether they won or lost. Then I ask for confirmation and write that data to the file on a new line, and append the date at the end like that. The code that prepares that data:

data = "%s - %s/%s/%s - %s - %s\n" % (
        champname, kills, deaths, assists, winloss, timestamp)

Basically I want to read that data back in another part of the program and display it to the user and do calculations with it like averages over time for a particular name.

I'm new to python and and I'm not very experienced with programming in general so most of the string splitting and formatting examples I find are just too cryptic for me to understand how to adapt to quite what I need here, could anyone help? I could format the written data differently so token finding would be simpler, but I want it to be simple directly in the file.

The following will read everything into a dictionary keyed by player name. The value associated with each player is itself a dictionary acting as a record with named fields associated with the items converted to a format suitable for further processing.

info = {}
with open('scoring_info.txt') as input_file:
    for line in input_file:
        player, stats, outcome, date = (
            item.strip() for item in line.split('-', 3))
        stats = dict(zip(('kills', 'deaths', 'assists'),
                          map(int, stats.split('/'))))
        date = tuple(map(int, date.split('-')))
        info[player] = dict(zip(('stats', 'outcome', 'date'),
                                (stats, outcome, date)))

print('info:')
for player, record in info.items():
    print('  player %r:' % player)
    for field, value in record.items():
        print('    %s: %s' % (field, value))

# sample usage
player = 'Fizz'
print('\n%s had %s kills in the game' % (player, info[player]['stats']['kills']))

Output:

info:
  player 'Shyvana':
    date: (2012, 11, 22)
    outcome: Loss
    stats: {'assists': 5, 'kills': 12, 'deaths': 4}
  player 'Miss Fortune':
    date: (2012, 11, 22)
    outcome: Win
    stats: {'assists': 3, 'kills': 12, 'deaths': 4}
  player 'Fizz':
    date: (2012, 11, 22)
    outcome: Win
    stats: {'assists': 5, 'kills': 12, 'deaths': 4}

Fizz had 12 kills in the game

Alternatively, rather than holding most of the data in dictionaries, which can make nested-field access a little awkward — info[player]['stats']['kills'] — you could instead use a little more advanced "generic" class to hold them, which will let you write info2[player].stats.kills instead.

To illustrate, here's almost the same thing using a class I've named Struct because it's somewhat like the C language's struct data type:

class Struct(object):
    """ Generic container object """
    def __init__(self, **kwds): # keyword args define attribute names and values
        self.__dict__.update(**kwds)

info2 = {}
with open('scoring_info.txt') as input_file:
    for line in input_file:
        player, stats, outcome, date = (
            item.strip() for item in line.split('-', 3))
        stats = dict(zip(('kills', 'deaths', 'assists'),
                          map(int, stats.split('/'))))
        victory = (outcome.lower() == 'win') # change to boolean T/F
        date = dict(zip(('year','month','day'), map(int, date.split('-'))))
        info2[player] = Struct(champ_name=player, stats=Struct(**stats),
                               victory=victory, date=Struct(**date))
print('info2:')
for rec in info2.values():
    print('  player %r:' % rec.champ_name)
    print('    stats: kills=%s, deaths=%s, assists=%s' % (
          rec.stats.kills, rec.stats.deaths, rec.stats.assists))
    print('    victorious: %s' % rec.victory)
    print('    date: %d-%02d-%02d' % (rec.date.year, rec.date.month, rec.date.day))

# sample usage
player = 'Fizz'
print('\n%s had %s kills in the game' % (player, info2[player].stats.kills))

Output:

info2:
  player 'Shyvana':
    stats: kills=12, deaths=4, assists=5
    victorious: False
    date: 2012-11-22
  player 'Miss Fortune':
    stats: kills=12, deaths=4, assists=3
    victorious: True
    date: 2012-11-22
  player 'Fizz':
    stats: kills=12, deaths=4, assists=5
    victorious: True
    date: 2012-11-22

Fizz had 12 kills in the game

There are two ways to read the data out from your textfile example.

First method

You can use python's csv module and specify that your delimiter is - .

See http://www.doughellmann.com/PyMOTW/csv/

Second method

Alternatively, if you don't want to use this csv module, you can simply use the split method after you have read each line in your file as a string.

f = open('myTextFile.txt', "r")
lines = f.readlines()

for line in lines:
    words = line.split("-")   # words is a list (of strings from a line), delimited by "-".

So in your example above, champname will actually be the first item in the words list, which is words[0] .

You want to use split (' - ') to get the parts, then perhaps again to get the numbers:

for line in yourfile.readlines ():
    data = line.split (' - ')
    nums = [int (x) for x in data[1].split ('/')]

Should get you all the stuff you need in data[] and nums[]. Alternatively, you can use the re module and write a regular expression for it. This doesn't seem complex enough for that, though.

# Iterates over the lines in the file.
for line in open('data_file.txt'):
    # Splits the line in four elements separated by dashes. Each element is then
    # unpacked to the correct variable name.
    champname, score, winloss, timestamp = line.split(' - ')

    # Since 'score' holds the string with the three values joined,
    # we need to split them again, this time using a slash as separator.
    # This results in a list of strings, so we apply the 'int' function
    # to each of them to convert to integer. This list of integers is
    # then unpacked into the kills, deaths and assists variables
    kills, deaths, assists = map(int, score.split('/'))

    # Now you are you free to use the variables read to whatever you want. Since
    # kills, deaths and assists are integers, you can sum, multiply and add
    # them easily.

First, you break the line into data fragments

>>> name, score, result, date = "Fizz - 12/4/5 - Win - 2012-11-22".split(' - ')
>>> name
'Fizz'
>>> score
'12/4/5'
>>> result
'Win'
>>> date
'2012-11-22'

Second, parse your score

>>> k,d,a = map(int, score.split('/'))
>>> k,d,a
(12, 4, 5)

And finally, convert the date string into date object

>>> from datetime import datetime    
>>> datetime.strptime(date, '%Y-%M-%d').date()
datetime.date(2012, 1, 22)

Now you have all your parts parsed and normalized to data types.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM