简体   繁体   English

使用python从文本文件中提取行

[英]Extract lines from text files using python

I have over 100 .out files, which are output files from a statistical software called MPlus. 我有100多个.out文件,它们是来自称为MPlus的统计软件的输出文件。 In each of the files (which can be opened with any text editor), out of several hundred lines of text, there are a couple of lines that I am interested in. The lines look like these -> 在每个文件中(可以用任何文本编辑器打开),在几百行文本中,有几行是我感兴趣的。这些行看起来像这样->

 I        ON
    K1                -0.247      0.321     -0.769      0.442
    K2                 0.161      0.232      0.696      0.486

 S        ON
    K1                 0.035      0.143      0.247      0.805
    K2                -0.123      0.154     -0.799      0.424

 Q        ON
    K1                 0.083      0.325      0.255      0.798
    K2                 0.039      0.229      0.169      0.866

 I        ON
    LABTOTF1           0.014      0.018      0.787      0.431
    LABTOTG2           0.011      0.017      0.626      0.532
    UGLABTOT           0.001      0.004      0.272      0.786
    UMLABTOT           0.098      0.147      0.664      0.507

 S        ON
    LABTOTF1          -0.008      0.019     -0.406      0.684
    LABTOTF2           0.000      0.013     -0.018      0.986
    UGLABTOT          -0.001      0.003     -0.209      0.835
    UMLABTOT          -0.063      0.115     -0.548      0.584

 Q        ON
    LABTOTF1          -0.013      0.025     -0.532      0.595
    LABTOTF2          -0.014      0.023     -0.596      0.551
    UGLABTOT           0.007      0.006      1.131      0.258
    UMLABTOT          -0.489      0.171     -2.859      0.004

The numbers keep changing and the variables (K1, K2, LABTOTF1, etc.), and the number of variables keep changing through the files. 数字不断变化,变量(K1,K2,LABTOTF1等)不断变化,变量数在文件中不断变化。 But I ON , S ON , Q ON are present in all the files. 但是I ONS ONQ ON出现在所有文件中。

I would like to extract these lines from these output files, and throw them into a single output file using a python script. 我想从这些输出文件中提取这些行,并使用python脚本将它们放入单个输出文件中。

As of yet, my method includes writing nested for loops which is neither efficient or effective since the number of lines keep changing in each file. 到目前为止,我的方法包括编写嵌套的for循环,由于每个文件中的行数一直在变化,因此效率不高或无效。

My first terrible 'test' attempt at getting just the line I ON and the values ( K1 & K2 ) uses the following lines of code: 我第一次可怕的“测试”尝试仅获取行I ON和值( K1 & K2 )使用以下代码行:

file = open("./my_folder/my_file.out","r")
lines = [line for line in file]
file.close()
collector = []
for i in range(0,len(lines)):
    if lines[i] == '\n':
        continue
    elif "I        ON\n" in lines[i]:
        collector.append(lines[i])
        collector.append(lines[i+1])
        collector.append(lines[i+2])
        i += 4
        continue

What is the most efficient and pythonic way of extracting these lines from a text file? 从文本文件中提取这些行的最有效和最pythonic方法是什么?

EDIT: The lines I am interested in are the 'header' as well as the lines which contain the variables+values. 编辑:我感兴趣的行是'header'以及包含变量+值的行。 For eg. 例如。 if I wanted the I ON section, I would like to pull the following lines from the previous example: 如果我想要I ON部分,我想从前面的示例中拉出以下几行:

I        ON
    K1                -0.247      0.321     -0.769      0.442
    K2                 0.161      0.232      0.696      0.486

Assuming this is the file structure: 假设这是文件结构:

out_lines = []
for line in lines:
    if len(line.strip().split()) == 2:
        out_lines.append(line)

You could use regular expressions, if you want to search for exact key structures. 如果要搜索确切的键结构,则可以使用正则表达式。 The code below is for only one '.out' file and produces one file for each heading type of your test data above. 下面的代码仅适用于一个'.out'文件,并针对上述测试数据的每种标题类型生成一个文件。

import re
file_path = 'E:\\' # the path to the folder with the .out file
file_name = 'test.out'

# for multiple files, insert create a loop for the section below.
with open(file_path + file_name, 'r') as f:
    line_keys = f.readline()
    while line_keys:  # If it is not empty
        key_search = re.search(' ?[ISQ]\s*ON', line_keys)  # search for the key pattern
        if key_search is not None:  # If a match is found
            file_output = line_keys[1:2] + '.txt'
            with open(file_path + file_output, 'a') as f_out:
                f_out.write(line_keys)  # If you repeatedly want the heading of each section
                while True:  # Read the subsequent lines
                    lines_data = f.readline()
                    if lines_data == "\n":
                        break
                    if lines_data == "":
                        break
                    f_out.write(lines_data)
                f_out.write('\n')  # to separate the different sections by a blank line
        line_keys = f.readline()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM