如何从python中的文本文件读取特定行？

Question

I have a text file that contains a lot of data. 我有一个包含大量数据的文本文件。 I want to be able to read the text file and write a new text file. 我希望能够读取文本文件并写入新的文本文件。 However on the new text file I don't want it to include some part of the orginal. 但是，在新的文本文件上，我不希望它包含原始字符的某些部分。

For example the text file has 例如，文本文件具有

------------------------
Age: 39
Gender: Female
Smoking: Yes
remarks: something about the person
-----------------------
Age: 52
Gender: Male
Smoking: Yes
remarks: something about the person
-----------------------

How do I get the new file to only read in age and gender so that the new text file will look like (also including the dashes that are divide each entry): 如何使新文件仅按年龄和性别读取，以便新文本文件看起来像（还包括将每个条目分开的破折号）：

-----------------------
Age: 39
Gender: Female
-----------------------
Age: 52
Gender: Male
-----------------------

I've seen a couple of codes and other questions but they all are not just removing specific lines. 我已经看到了几个代码和其他问题，但它们都不只是删除特定的行。

Answer 1

with open('path/to/infile') as infile, open('path/to/outfile', 'w') as outfile:
    for line in infile:
        if line.startswith(("Age", "Gender", "----")):
            outfile.write(line)

Alternatively with grep : 或者使用grep ：

grep -ioP '^-.*$|^Age:.*$|^Gender:.*$' path/to/infile.txt > path/to/outfile.txt

Answer 2

import re

file = open('filename.txt','rb').read()

a = re.findall(r'Age: (\d+)\nGender: (Male|Female)', file)

print "-----------------------"
for n in a:
    print 'Age: '+n[0]+'\nGender: '+n[1]
    print "-----------------------"

You can be even lazier and grab the Dashes in the regex too 您甚至可以变得更懒惰，并且也可以在正则表达式中获取Dashs

a = re.findall(r'Age: (\d+)\nGender: (Male|Female)(?:.*\n){3}(\-*)', file)

for n in a:
    print "Age: "+n[0]+ "\nGender: "+n[1]+"\n" + n[2]

如何从python中的文本文件读取特定行？

问题描述

2 个解决方案

解决方案1
5 2014-07-17 16:32:09

解决方案2
0 2014-07-17 16:51:51

如何从python中的文本文件读取特定行？

问题描述

2 个解决方案

解决方案1 5 2014-07-17 16:32:09

解决方案2 0 2014-07-17 16:51:51

解决方案1
5 2014-07-17 16:32:09

解决方案2
0 2014-07-17 16:51:51