搜索文本文件并获取python中不包含##的所有行

Question

I am trying to write a python script to read in a large text file from some modeling results, grab the useful data and save it as a new array. 我正在尝试编写python脚本，以从一些建模结果中读取大文本文件，获取有用的数据并将其另存为新数组。 The text file is output in a way that has a ## starting each line that is not useful. 文本文件以##开头的无用行开头的方式输出。 I need a way to search through and grab all the lines that do not include the ##. 我需要一种搜索并抓住所有不包含##的行的方法。 I am used to using grep -v in this situation and piping to a file. 我习惯在这种情况下使用grep -v并管道传输到文件。 I want to do it in python! 我想用python做！

Thanks a lot. 非常感谢。

-Tyler -泰勒

Answer 1

I would use something like this: 我会用这样的东西：

fh = open(r"C:\Path\To\File.txt", "r")

raw_text = fh.readlines()
clean_text = []

for line in raw_text:
    if not line.startswith("##"):
        clean_text.append(line)

Or you could also clean the newline and carriage return non-printing characters at the same time with a small modification: 或者，您也可以通过稍作修改就同时清除换行符和回车符的非打印字符：

for line in raw_text:
    if not line.startswith("##"):
        clean_text.append(line.rstrip("\r\n"))

You would be left with a list object that contains one line of required text per element. 您将剩下一个列表对象，其中每个元素包含一行必需的文本。 You could split this into individual words using string.split() which would give you a nested list per original list element which you could easily index (assuming your text has whitespaces of course). 您可以使用string.split（）将其拆分为单个单词，这将为您提供每个原始列表元素的嵌套列表，您可以轻松对其进行索引（假设您的文本当然有空格）。

clean_text[4][7]

would return the 5th line, 8th word. 将返回第5行第8个字。

Hope this helps. 希望这可以帮助。

[Edit: corrected indentation in loop] [编辑：循环中的更正缩进]

Answer 2

My suggestion would be to do the following: 我的建议是执行以下操作：

listoflines = [ ] 

with open(.txt, "r") as f:     # .txt = file, "r" = read
    for line in f:
        if line[:2] != "##": #Read until the second character 
            listoflines.append(line)


print listoflines

If you're feeling brave, you can also do the following, CREDITS GO TO ALEX THORNTON: 如果您觉得自己很勇敢，也可以执行以下操作，将积分转到ALEX THORNTON：

listoflines = [l for l in f if not l.startswith('##')]

The other answer is great as well, especially teaching the .startswith function, but I think this is the more pythonic way and also has the advantage of automatically closing the file as soon as you're done with it. 另一个答案也很好，尤其是教.startswith函数，但是我认为这是更pythonic的方式，并且还具有在完成文件后立即自动关闭文件的优点。

搜索文本文件并获取python中不包含##的所有行

问题描述

2 个解决方案

解决方案1
1 2014-04-10 15:43:29

解决方案2
1 2014-04-10 15:54:02

搜索文本文件并获取python中不包含##的所有行

问题描述

2 个解决方案

解决方案1 1 2014-04-10 15:43:29

解决方案2 1 2014-04-10 15:54:02

解决方案1
1 2014-04-10 15:43:29

解决方案2
1 2014-04-10 15:54:02