查找特定字符串的行並在該行之后讀取文本文件

Question

我有一個文本文件（~20MB），我想從中提取一些信息。 我感興趣的信息如下所示：

   Generate :
 MESH :     Cartesian
   1.00000   0.00000   0.00000
   0.00000   0.84680   0.00000
   0.00000   0.00000   0.80724
 MESH : 4 unique points
               x           y           z        Weight
    1      0.000000    0.000000    0.000000     0.3906
    2      0.125000    0.000000    0.000000     0.7812
    3      0.250000    0.000000    0.000000     0.7812
    4      0.375000    0.000000    0.000000     0.7812

我想在第二次出現字符串“MESH”后將 x、y、z 列保存到數組中。 我嘗試使用正則表達式，但我的解決方案將結果保存為列表，並且為了將來的目的調用這些值變得過於復雜。 這是我的嘗試：

import re

line_number = 0
mesh_list = []
Qp = []
with open('out.test','r') as f:
    for line in f:
        line_number +=1
        if 'MESH' in line:
            mesh_list.append([(line_number),line.rstrip()])

point_info = mesh_list[1]
output_line = point_info[0]             ## Line number where MESH appears the second time.
point_list = point_info[1].split()
num_of_points = int(point_list[1])      ## Get number of unique points.

with open('out.test','r') as f:
    for i, line in enumerate(f):
        if output_line+1 <= i <= output_line+num_of_points:
            Qp.append([line])

print(Qp)

在這一點上，'Qp' 擁有我需要的所有行，但是我怎樣才能將 x、y、z 列從這個塊中分離出來呢？ 使用 pandas 會更容易嗎？

Answer 1

您可以將pd.read_csv與自定義skiprows=和sep=參數一起使用：

import re
import pandas as pd

r = re.compile(r"MESH : \d+ unique points")

line_counter = 0
with open("your_file.txt", "r") as f_in:
    for l in f_in:
        line_counter += 1
        if r.search(l):
            break

df = pd.read_csv("your_file.txt", skiprows=line_counter, sep=r"\s+")
print(df)

印刷：

       x    y    z  Weight
1  0.000  0.0  0.0  0.3906
2  0.125  0.0  0.0  0.7812
3  0.250  0.0  0.0  0.7812
4  0.375  0.0  0.0  0.7812

Answer 2

要從Qp獲取x 、 y 、 z （行號和Weight ），它是一個元素列表的列表，作為tuple s （將tuple s 轉換為list s 很簡單），您可以嘗試：

>>> Qp
[['    1      0.000000    0.000000    0.000000     0.3906\n'], ['    2      0.125000    0.000000    0.000000     0.7812\n'], ['    3      0.250000    0.000000    0.000000     0.7812\n'], ['    4      0.375000    0.000000    0.000000     0.7812\n']]

>>> lno, x, y, z, Weight  = zip(*(line[0].split() for line in Qp))
>>> lno
('1', '2', '3', '4')
>>> x
('0.000000', '0.125000', '0.250000', '0.375000')
>>> y
('0.000000', '0.000000', '0.000000', '0.000000')
>>> z
('0.000000', '0.000000', '0.000000', '0.000000')
>>> Weight
('0.3906', '0.7812', '0.7812', '0.7812')

對於float s 而不是str s：

>>> lno, x, y, z, Weight = zip(*((float(a) for a in line[0].split()) for line in Qp))

要將x 、 y 、 z （和Weight ）作為 dataframe 的列：

>>> import pandas
>>> import re
>>>
>>> with open('out.test','r') as f:
...     for i, line in enumerate(f,1):
...         m = re.search('MESH : (\d+) unique points', line)
...         if m:
...             break
...
>>> i
6
>>> m.group(1)
'4'
>>> df = pd.read_csv('out.test', skiprows=i, nrows=int(m.group(1))+1, sep=r"\s+")
>>> df
       x    y    z  Weight
1  0.000  0.0  0.0  0.3906
2  0.125  0.0  0.0  0.7812
3  0.250  0.0  0.0  0.7812
4  0.375  0.0  0.0  0.7812

查找特定字符串的行並在該行之后讀取文本文件

問題描述

2 個解決方案

解決方案1
1 已采納 2021-05-01 21:53:56

解決方案2
1 2021-05-01 22:07:57

查找特定字符串的行並在該行之后讀取文本文件

問題描述

2 個解決方案

解決方案1 1 已采納 2021-05-01 21:53:56

解決方案2 1 2021-05-01 22:07:57

解決方案1
1 已采納 2021-05-01 21:53:56

解決方案2
1 2021-05-01 22:07:57