简体   繁体   English

如何从大文本文件中提取特定的数据行

[英]How to extract specific lines of data from a big text file

I have a text file that contains something like this: 我有一个包含以下内容的文本文件:

# Comment
# Comment
# Comment
# Comment

# Comment
  Comment
# Comment

**#Raw SIFs at Crack Propagation Step: 0**
# Vertex, X,              Y,              Z,              K_I,            K_II,
  0      , 2.100000e+00   , 2.000000e+00   , -1.000000e-04  , 0.000000e+00   , 0.000000e+00   ,
  1      , 2.100000e+00   , 2.000000e+00   , 1.699733e-01   , 8.727065e+00   , -8.696262e-04  ,
  2      , 2.100000e+00   , 2.000000e+00   , 3.367067e-01   , 8.907810e+00   , -2.548819e-04  ,

**# MLS SIFs at Crack Propagation Step: 0**

# MLS approximation: 
# Sample, t,             NA,             NA,              K_I,            K_II,
# Crack front stretch: 0
  0      , 0.000000e+00   , 0.000000e+00   , 0.000000e+00   , 8.446880e+00   , -1.360875e-03  ,
  1      , 5.670333e-02   , 0.000000e+00   , 0.000000e+00   , 8.554168e+00   , -1.156931e-03  ,
  2      , 1.134067e-01   , 0.000000e+00   , 0.000000e+00   , 8.648241e+00   , -9.755573e-04  ,

# more comments
  more comments
# more comments

**# Raw SIFs at Crack Propagation Step: 1**
# Vertex, X,              Y,              Z,              K_I,            K_II,
  0      , 2.186139e+00   , 2.000000e+00   , -1.688418e-03  , 0.000000e+00   , 0.000000e+00   ,
  1      , 2.192003e+00   , 2.000000e+00   , 1.646902e-01   , 9.571022e+00   , 4.770358e-03   ,
  2      , 2.196234e+00   , 2.000000e+00   , 3.319183e-01   , 9.693934e+00   , -9.634989e-03  ,

**# MLS SIFs at Crack Propagation Step: 1**

# MLS approximation: 
# Sample, t,             NA,             NA,              K_I,            K_II,
# Crack front stretch: 0
   0      , 0.000000e+00   , 0.000000e+00   , 0.000000e+00   , 9.402031e+00   , 2.097959e-02   ,
   1      , 5.546786e-02   , 0.000000e+00   , 0.000000e+00   , 9.467541e+00   , 1.443546e-02   ,
   2      , 1.109357e-01   , 0.000000e+00   , 0.000000e+00   , 9.525021e+00   , 8.554051e-03   ,

As you can see, the lines without the # symbol contains the data I would like to plot. 如您所见,不带#符号的行包含我要绘制的数据。 I've only shown you a short portion of step 0 and step 1, but there are around 20 steps in this file. 我仅向您展示了步骤0和步骤1的一小部分,但是此文件中大约有20个步骤。 And in each step, there are two types of data: RAW SIFS and MLS SIFS. 在每个步骤中,都有两种类型的数据:RAW SIFS和MLS SIFS。 For each section of data, I would like to plot a line graph of: vertex (1st column) versus K_I (5th column), and vertex (1st column) versus K_II (6th column) 对于数据的每个部分,我想绘制一个折线图:顶点(第一列)与K_I(第五列),以及顶点(第一列)与K_II(第六列)

So, in the end I would like the 20 steps of RAW SIFS for vertex vs K_I with 20 curves all in one graph. 因此,最后,我想在一张图中同时包含20条曲线的RAW vs K_I的20步。 Then, another graph of the 20 steps of RAW SIFS for vertex vs K_II. 然后,绘制另一个针对顶点与K_II的RAW SIFS的20个步骤的图形。 Similarly, I would like the 20 steps of MLS SIFS for vertex vs K_I with 20 curves all in one graph. 类似地,我想在20个曲线中将顶点与K_I的20个步骤的MLS SIFS都包含在一张图中。 Then, another graph of the 20 steps of MLS SIFS for vertex vs K_II. 然后,另一个图显示了顶点vs K_II的MLS SIFS的20个步骤。

So far, I created a seperate text file where I just have one section of the original file. 到目前为止,我已经创建了一个单独的文本文件,其中只有原始文件的一部分。 So for the Raw SIFs at Crack Propagation Step: 0 section, the code I have written uses numpy.loadtxt() to read the file: 因此,对于“ 裂纹传播步骤:0”中Raw SIF,我编写的代码使用numpy.loadtxt()读取文件:

import numpy
with open("numfile.txt") as RawStep0:
Vertex, K_I, K_II = numpy.loadtxt(RawStep0, usecols = (0, 4, 5), dtype = float, delimiter=" , ",
                                             skiprows = 2, unpack = True)

my output ---> 我的输出--->

Vertex = array([ 0., 1., 2.]) 顶点=数组([0.,1.,2.])

K_I = array([0. , 8.727065, 8.90781]) K_I =数组([0.,8.727065,8.90781])

How do I go about writing the code for the original file without the need to create seperate files for each section? 我该如何为原始文件编写代码而无需为每个部分创建单独的文件? How can I skip all those rows with the # symbol and create the arrays I need to plot? 如何跳过所有带有#符号的行并创建需要绘制的数组?

try: 尝试:

# helper function to parse a data block
def parse_SIF(lines):
    SIF = []
    while lines:
        line = lines.pop(0).lstrip()
        if line == '' or line.startswith('#'):
            continue
        if line.startswith('**#'):
            lines.insert(0, line)
            break
        data = line.split(',')
        # pick only columns 0, 4, 5 and
        # convert to appropiate numeric format
        # and append to list for current SIF and step
        SIF.append([int(data[0]), float(data[4]), float(data[5])])
    return SIF

# your global data structure - nested lists
raw = []
mls = []

# read whole file into one list - ok if your data is not large
with open('data') as fptr:
    lines = fptr.readlines()

# global parse routine - call helper function to parse data blocks
while lines:
    line = lines.pop(0)
    if line.startswith('**#'):
        if line.find('Raw SIFs at Crack Propagation Step:') > -1:
            raw.append(parse_SIF(lines))
        if line.find('MLS SIFs at Crack Propagation Step:') > -1:
            mls.append(parse_SIF(lines))

# show results for your example data
from pprint import pprint
for raw_step, mls_step in zip(raw, mls):
    print 'raw:'
    pprint(raw_step)
    print 'mls:'
    pprint(mls_step)

which yields: 产生:

raw:
[[0, 0.0, 0.0], [1, 8.727065, -0.0008696262], [2, 8.90781, -0.0002548819]]
mls:
[[0, 8.44688, -0.001360875],
 [1, 8.554168, -0.001156931],
 [2, 8.648241, -0.0009755573]]
raw:
[[0, 0.0, 0.0], [1, 9.571022, 0.004770358], [2, 9.693934, -0.009634989]]
mls:
[[0, 9.402031, 0.02097959],
 [1, 9.467541, 0.01443546],
 [2, 9.525021, 0.008554051]]

This is more a general hint: Have you ever considered to use a more appropriate file format? 这更笼统:您是否考虑过使用更合适的文件格式? In your use case I would recommend the hdf5 file format. 在您的用例中,我建议使用hdf5文件格式。 There exists very nice python bindings for it: http://code.google.com/p/h5py/ 有非常好的python绑定: http : //code.google.com/p/h5py/

Hdf5 supports sectioning and the python bindings also supports slicing and numpy. Hdf5支持分段,而python绑定也支持切片和numpy。 I think this would make things more easier for you. 我认为这会使您更轻松。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM