简体   繁体   English

如何从python中的文本文件读取特定行?

[英]How to read specific lines from a text file in python?

I have a text file that contains a lot of data. 我有一个包含大量数据的文本文件。 I want to be able to read the text file and write a new text file. 我希望能够读取文本文件并写入新的文本文件。 However on the new text file I don't want it to include some part of the orginal. 但是,在新的文本文件上,我不希望它包含原始字符的某些部分。

For example the text file has 例如,文本文件具有

------------------------
Age: 39
Gender: Female
Smoking: Yes
remarks: something about the person
-----------------------
Age: 52
Gender: Male
Smoking: Yes
remarks: something about the person
-----------------------

How do I get the new file to only read in age and gender so that the new text file will look like (also including the dashes that are divide each entry): 如何使新文件仅按年龄和性别读取,以便新文本文件看起来像(还包括将每个条目分开的破折号):

-----------------------
Age: 39
Gender: Female
-----------------------
Age: 52
Gender: Male
-----------------------

I've seen a couple of codes and other questions but they all are not just removing specific lines. 我已经看到了几个代码和其他问题,但它们都不只是删除特定的行。

with open('path/to/infile') as infile, open('path/to/outfile', 'w') as outfile:
    for line in infile:
        if line.startswith(("Age", "Gender", "----")):
            outfile.write(line)

Alternatively with grep : 或者使用grep

grep -ioP '^-.*$|^Age:.*$|^Gender:.*$' path/to/infile.txt > path/to/outfile.txt
import re

file = open('filename.txt','rb').read()

a = re.findall(r'Age: (\d+)\nGender: (Male|Female)', file)

print "-----------------------"
for n in a:
    print 'Age: '+n[0]+'\nGender: '+n[1]
    print "-----------------------"

You can be even lazier and grab the Dashes in the regex too 您甚至可以变得更懒惰,并且也可以在正则表达式中获取Dashs

a = re.findall(r'Age: (\d+)\nGender: (Male|Female)(?:.*\n){3}(\-*)', file)

for n in a:
    print "Age: "+n[0]+ "\nGender: "+n[1]+"\n" + n[2]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM