简体   繁体   English

在Python中解析文本文件

[英]Parse text file in Python

I have txt file and I want to learn how to parse txt file in Python. 我有txt文件,我想学习如何在Python中解析txt文件。

txt file: txt文件:

April 2011
05.05.2013 8:30 20:50

(here I can have different data) (这里我可以有不同的数据)

How can I parse this file and have all data in separate variable? 如何解析此文件并将所有数据放在单独的变量中?

example output: 示例输出:

month = "April 2011"
mydate = "05.05.2013"
time_start = "8:30"
time_stop = "20:50"

The key here is to first analyze your input file format and what you want out of it. 这里的关键是首先分析您的输入文件格式以及您想要的内容。 Let's consider your input data: 让我们考虑一下你的输入数据:

April 2011
05.05.2013 8:30 20:50

What do we have here? 我们有什么在这里?

The first line has the Month and the year separated by a space. 第一行的月份和年份用空格分隔。 If you want "April 2011" together as a separate Python label (variable), you can just read the entire file using readlines() method and the first item of the list will be "April 2011". 如果您希望将“April 2011”作为单独的Python标签(变量)放在一起,则可以使用readlines()方法读取整个文件,列表的第一项将是“2011年4月”。

Next line, we have the date and two time "fields" each separated by a space. 下一行,我们有日期和两个时间“字段”,每个“字段”用空格分隔。 As per your output requirements, you want each of these in separate Python labels (variables). 根据您的输出要求,您希望每个都在单独的Python标签(变量)中。 So, just reading the second line is not sufficient for you. 所以,只读第二行对你来说还不够。 You will have to separate each of the "fields" above. 您必须将上面的每个“字段”分开。 The split() method will prove useful here. split()方法在这里证明是有用的。 Here is an example: 这是一个例子:

>>> s = '05.05.2013 8:30 20:50'
>>> s.split()
['05.05.2013', '8:30', '20:50']

As you can see, now you have the fields separated as items of a list. 如您所见,现在您将字段分隔为列表项。 You can now easily assign them to separate labels (or variables). 您现在可以轻松地将它们分配给单独的标签(或变量)。

Depending on what other data you have, you should try and attempt a similar approach of first analyzing how you can get the data you need from each line of the file. 根据您拥有的其他数据,您应该尝试尝试类似的方法,首先分析如何从文件的每一行获取所需的数据。

with open('file') as f:
  tmp = f.read()
  tmp2 = f.split('\n')
  month = tmp2[0]
  tmp = tmp2[1].split(' ')
  mydata = tmp[0]
  time_start = tmp[1]
  time_stop = tmp[2]
with open("Input.txt") as inputFile:
    lines = [line for line in inputFile]
    month, mydate, time_start, time_stop = [lines[0].strip()] + lines[1].strip().split()
    print month, mydate, time_start, time_stop

Output 产量

April 2011 05.05.2013 8:30 20:50

file a.txt; 文件a.txt;

April 2011
05.05.2013 8:30 20:50
May 2011
08.05.2013 8:32 21:51
June 2011
05.06.2013 9:30 23:50
September 2011
05.09.2013 18:30 20:50

python code; python代码;

import itertools

my_list = list()
with open('a.txt') as f:
    for line1,line2 in itertools.izip_longest(*[f]*2):
        mydate, time_start, time_stop = line2.split()
        my_list.append({
            'month':line1.strip(),
            'mydate': mydate,
            'time_start': time_start,
            'time_stop': time_stop,
        })

print(my_list)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM