简体   繁体   English

从文本文件的行中提取数据

[英]Extract data from lines of a text file

I need to extract data from lines of a text file. 我需要从文本文件的行中提取数据。 The data is name and scoring information formatted like this: 数据是名称和评分信息,格式如下:

Shyvana - 12/4/5 - Loss - 2012-11-22
Fizz - 12/4/5 - Win - 2012-11-22
Miss Fortune - 12/4/3 - Win - 2012-11-22

This file is generated by another part of my little python program where I ask the user for the name, lookup the name they enter to ensure it's valid from a list of names, and then ask for kills, deaths, assists, and whether they won or lost. 这个文件是由我的小python程序的另一部分生成的,我要求用户输入名称,查找他们输入的名称以确保它从名单列表中有效,然后询问杀戮,死亡,助攻以及他们是否赢了还是丢了 Then I ask for confirmation and write that data to the file on a new line, and append the date at the end like that. 然后我要求确认并将该数据写入新行的文件中,并将日期附加到最后。 The code that prepares that data: 准备该数据的代码:

data = "%s - %s/%s/%s - %s - %s\n" % (
        champname, kills, deaths, assists, winloss, timestamp)

Basically I want to read that data back in another part of the program and display it to the user and do calculations with it like averages over time for a particular name. 基本上我想在程序的另一部分中读取该数据并将其显示给用户并使用它进行计算,例如特定名称随时间的平均值。

I'm new to python and and I'm not very experienced with programming in general so most of the string splitting and formatting examples I find are just too cryptic for me to understand how to adapt to quite what I need here, could anyone help? 我是python的新手,而且我对编程一般都不是很有经验,所以我找到的大部分字符串拆分和格式化示例对我来说太过神秘了解如何适应我在这里需要的东西,任何人都可以帮助? I could format the written data differently so token finding would be simpler, but I want it to be simple directly in the file. 我可以不同地格式化写入的数据,因此令牌查找会更简单,但我希望它在文件中直接简单。

The following will read everything into a dictionary keyed by player name. 以下内容将所有内容读入由播放器名称键入的字典中。 The value associated with each player is itself a dictionary acting as a record with named fields associated with the items converted to a format suitable for further processing. 与每个播放器相关联的值本身是充当记录的字典,其具有与转换为适于进一步处理的格式的项相关联的命名字段。

info = {}
with open('scoring_info.txt') as input_file:
    for line in input_file:
        player, stats, outcome, date = (
            item.strip() for item in line.split('-', 3))
        stats = dict(zip(('kills', 'deaths', 'assists'),
                          map(int, stats.split('/'))))
        date = tuple(map(int, date.split('-')))
        info[player] = dict(zip(('stats', 'outcome', 'date'),
                                (stats, outcome, date)))

print('info:')
for player, record in info.items():
    print('  player %r:' % player)
    for field, value in record.items():
        print('    %s: %s' % (field, value))

# sample usage
player = 'Fizz'
print('\n%s had %s kills in the game' % (player, info[player]['stats']['kills']))

Output: 输出:

info:
  player 'Shyvana':
    date: (2012, 11, 22)
    outcome: Loss
    stats: {'assists': 5, 'kills': 12, 'deaths': 4}
  player 'Miss Fortune':
    date: (2012, 11, 22)
    outcome: Win
    stats: {'assists': 3, 'kills': 12, 'deaths': 4}
  player 'Fizz':
    date: (2012, 11, 22)
    outcome: Win
    stats: {'assists': 5, 'kills': 12, 'deaths': 4}

Fizz had 12 kills in the game

Alternatively, rather than holding most of the data in dictionaries, which can make nested-field access a little awkward — info[player]['stats']['kills'] — you could instead use a little more advanced "generic" class to hold them, which will let you write info2[player].stats.kills instead. 或者,不要将大部分数据保存在字典中,这可能会使嵌套字段访问变得有点尴尬 - info[player]['stats']['kills'] - 你可以改用一些更高级的“泛型”类持有它们,这将让你写信息info2[player].stats.kills

To illustrate, here's almost the same thing using a class I've named Struct because it's somewhat like the C language's struct data type: 为了说明,使用我命名为Struct的类几乎是一样的,因为它有点像C语言的struct数据类型:

class Struct(object):
    """ Generic container object """
    def __init__(self, **kwds): # keyword args define attribute names and values
        self.__dict__.update(**kwds)

info2 = {}
with open('scoring_info.txt') as input_file:
    for line in input_file:
        player, stats, outcome, date = (
            item.strip() for item in line.split('-', 3))
        stats = dict(zip(('kills', 'deaths', 'assists'),
                          map(int, stats.split('/'))))
        victory = (outcome.lower() == 'win') # change to boolean T/F
        date = dict(zip(('year','month','day'), map(int, date.split('-'))))
        info2[player] = Struct(champ_name=player, stats=Struct(**stats),
                               victory=victory, date=Struct(**date))
print('info2:')
for rec in info2.values():
    print('  player %r:' % rec.champ_name)
    print('    stats: kills=%s, deaths=%s, assists=%s' % (
          rec.stats.kills, rec.stats.deaths, rec.stats.assists))
    print('    victorious: %s' % rec.victory)
    print('    date: %d-%02d-%02d' % (rec.date.year, rec.date.month, rec.date.day))

# sample usage
player = 'Fizz'
print('\n%s had %s kills in the game' % (player, info2[player].stats.kills))

Output: 输出:

info2:
  player 'Shyvana':
    stats: kills=12, deaths=4, assists=5
    victorious: False
    date: 2012-11-22
  player 'Miss Fortune':
    stats: kills=12, deaths=4, assists=3
    victorious: True
    date: 2012-11-22
  player 'Fizz':
    stats: kills=12, deaths=4, assists=5
    victorious: True
    date: 2012-11-22

Fizz had 12 kills in the game

There are two ways to read the data out from your textfile example. 有两种方法可以从textfile示例中读取数据。

First method 第一种方法

You can use python's csv module and specify that your delimiter is - . 您可以使用python的csv模块并指定您的分隔符是-

See http://www.doughellmann.com/PyMOTW/csv/ http://www.doughellmann.com/PyMOTW/csv/

Second method 第二种方法

Alternatively, if you don't want to use this csv module, you can simply use the split method after you have read each line in your file as a string. 或者,如果您不想使用此csv模块,则可以在将文件中的每一行作为字符串读取后使用split方法。

f = open('myTextFile.txt', "r")
lines = f.readlines()

for line in lines:
    words = line.split("-")   # words is a list (of strings from a line), delimited by "-".

So in your example above, champname will actually be the first item in the words list, which is words[0] . 所以在上面的例子中, champname实际上是words列表中的第一个项目,即words[0]

You want to use split (' - ') to get the parts, then perhaps again to get the numbers: 你想使用split(' - ')获取部分,然后再次获取数字:

for line in yourfile.readlines ():
    data = line.split (' - ')
    nums = [int (x) for x in data[1].split ('/')]

Should get you all the stuff you need in data[] and nums[]. 应该在data []和nums []中获取所需的所有内容。 Alternatively, you can use the re module and write a regular expression for it. 或者,您可以使用re模块并为其编写正则表达式。 This doesn't seem complex enough for that, though. 不过,这似乎不够复杂。

# Iterates over the lines in the file.
for line in open('data_file.txt'):
    # Splits the line in four elements separated by dashes. Each element is then
    # unpacked to the correct variable name.
    champname, score, winloss, timestamp = line.split(' - ')

    # Since 'score' holds the string with the three values joined,
    # we need to split them again, this time using a slash as separator.
    # This results in a list of strings, so we apply the 'int' function
    # to each of them to convert to integer. This list of integers is
    # then unpacked into the kills, deaths and assists variables
    kills, deaths, assists = map(int, score.split('/'))

    # Now you are you free to use the variables read to whatever you want. Since
    # kills, deaths and assists are integers, you can sum, multiply and add
    # them easily.

First, you break the line into data fragments 首先,您将该行分成数据片段

>>> name, score, result, date = "Fizz - 12/4/5 - Win - 2012-11-22".split(' - ')
>>> name
'Fizz'
>>> score
'12/4/5'
>>> result
'Win'
>>> date
'2012-11-22'

Second, parse your score 其次,解析你的分数

>>> k,d,a = map(int, score.split('/'))
>>> k,d,a
(12, 4, 5)

And finally, convert the date string into date object 最后,将日期字符串转换为日期对象

>>> from datetime import datetime    
>>> datetime.strptime(date, '%Y-%M-%d').date()
datetime.date(2012, 1, 22)

Now you have all your parts parsed and normalized to data types. 现在,您已解析所有部件并将其规范化为数据类型。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM