简体   繁体   English

搜索文本文件并获取python中不包含##的所有行

[英]Searching a text file and grabbing all lines that do not include ## in python

I am trying to write a python script to read in a large text file from some modeling results, grab the useful data and save it as a new array. 我正在尝试编写python脚本,以从一些建模结果中读取大文本文件,获取有用的数据并将其另存为新数组。 The text file is output in a way that has a ## starting each line that is not useful. 文本文件以##开头的无用行开头的方式输出。 I need a way to search through and grab all the lines that do not include the ##. 我需要一种搜索​​并抓住所有不包含##的行的方法。 I am used to using grep -v in this situation and piping to a file. 我习惯在这种情况下使用grep -v并管道传输到文件。 I want to do it in python! 我想用python做!

Thanks a lot. 非常感谢。

-Tyler -泰勒

I would use something like this: 我会用这样的东西:

fh = open(r"C:\Path\To\File.txt", "r")

raw_text = fh.readlines()
clean_text = []

for line in raw_text:
    if not line.startswith("##"):
        clean_text.append(line)

Or you could also clean the newline and carriage return non-printing characters at the same time with a small modification: 或者,您也可以通过稍作修改就同时清除换行符和回车符的非打印字符:

for line in raw_text:
    if not line.startswith("##"):
        clean_text.append(line.rstrip("\r\n"))

You would be left with a list object that contains one line of required text per element. 您将剩下一个列表对象,其中每个元素包含一行必需的文本。 You could split this into individual words using string.split() which would give you a nested list per original list element which you could easily index (assuming your text has whitespaces of course). 您可以使用string.split()将其拆分为单个单词,这将为您提供每个原始列表元素的嵌套列表,您可以轻松对其进行索引(假设您的文本当然有空格)。

clean_text[4][7]

would return the 5th line, 8th word. 将返回第5行第8个字。

Hope this helps. 希望这可以帮助。

[Edit: corrected indentation in loop] [编辑:循环中的更正缩进]

My suggestion would be to do the following: 我的建议是执行以下操作:

listoflines = [ ] 

with open(.txt, "r") as f:     # .txt = file, "r" = read
    for line in f:
        if line[:2] != "##": #Read until the second character 
            listoflines.append(line)


print listoflines

If you're feeling brave, you can also do the following, CREDITS GO TO ALEX THORNTON: 如果您觉得自己很勇敢,也可以执行以下操作,将积分转到ALEX THORNTON:

listoflines = [l for l in f if not l.startswith('##')]

The other answer is great as well, especially teaching the .startswith function, but I think this is the more pythonic way and also has the advantage of automatically closing the file as soon as you're done with it. 另一个答案也很好,尤其是教.startswith函数,但是我认为这是更pythonic的方式,并且还具有在完成文件后立即自动关闭文件的优点。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM