[英]Finding the line of a specific string and reading the text file after that line
I have a text file (~20MB) from which I want to extract some information.我有一个文本文件(~20MB),我想从中提取一些信息。 The info I am interested in looks like this:
我感兴趣的信息如下所示:
Generate :
MESH : Cartesian
1.00000 0.00000 0.00000
0.00000 0.84680 0.00000
0.00000 0.00000 0.80724
MESH : 4 unique points
x y z Weight
1 0.000000 0.000000 0.000000 0.3906
2 0.125000 0.000000 0.000000 0.7812
3 0.250000 0.000000 0.000000 0.7812
4 0.375000 0.000000 0.000000 0.7812
I want to save the x,y,z columns to an array after the second occurrence of the string 'MESH'.我想在第二次出现字符串“MESH”后将 x、y、z 列保存到数组中。 I tried using regex, but my solution saves the result as a list and makes it too complicated to call these values for future purposes.
我尝试使用正则表达式,但我的解决方案将结果保存为列表,并且为了将来的目的调用这些值变得过于复杂。 Here is my attempt:
这是我的尝试:
import re
line_number = 0
mesh_list = []
Qp = []
with open('out.test','r') as f:
for line in f:
line_number +=1
if 'MESH' in line:
mesh_list.append([(line_number),line.rstrip()])
point_info = mesh_list[1]
output_line = point_info[0] ## Line number where MESH appears the second time.
point_list = point_info[1].split()
num_of_points = int(point_list[1]) ## Get number of unique points.
with open('out.test','r') as f:
for i, line in enumerate(f):
if output_line+1 <= i <= output_line+num_of_points:
Qp.append([line])
print(Qp)
At this point, 'Qp' has all the lines I need, but how can I separate x,y,z columns from this chunk?在这一点上,'Qp' 拥有我需要的所有行,但是我怎样才能将 x、y、z 列从这个块中分离出来呢? Would it be easier with pandas?
使用 pandas 会更容易吗?
You can use pd.read_csv
with custom skiprows=
and sep=
parameters:您可以将
pd.read_csv
与自定义skiprows=
和sep=
参数一起使用:
import re
import pandas as pd
r = re.compile(r"MESH : \d+ unique points")
line_counter = 0
with open("your_file.txt", "r") as f_in:
for l in f_in:
line_counter += 1
if r.search(l):
break
df = pd.read_csv("your_file.txt", skiprows=line_counter, sep=r"\s+")
print(df)
Prints:印刷:
x y z Weight
1 0.000 0.0 0.0 0.3906
2 0.125 0.0 0.0 0.7812
3 0.250 0.0 0.0 0.7812
4 0.375 0.0 0.0 0.7812
To get x
, y
, z
(line number and Weight
) from Qp
, which is a list of one-element lists, as tuple
s (converting the tuple
s to list
s is trivial) you can try:要从
Qp
获取x
、 y
、 z
(行号和Weight
),它是一个元素列表的列表,作为tuple
s (将tuple
s 转换为list
s 很简单),您可以尝试:
>>> Qp
[[' 1 0.000000 0.000000 0.000000 0.3906\n'], [' 2 0.125000 0.000000 0.000000 0.7812\n'], [' 3 0.250000 0.000000 0.000000 0.7812\n'], [' 4 0.375000 0.000000 0.000000 0.7812\n']]
>>> lno, x, y, z, Weight = zip(*(line[0].split() for line in Qp))
>>> lno
('1', '2', '3', '4')
>>> x
('0.000000', '0.125000', '0.250000', '0.375000')
>>> y
('0.000000', '0.000000', '0.000000', '0.000000')
>>> z
('0.000000', '0.000000', '0.000000', '0.000000')
>>> Weight
('0.3906', '0.7812', '0.7812', '0.7812')
For float
s instead of str
s:对于
float
s 而不是str
s:
>>> lno, x, y, z, Weight = zip(*((float(a) for a in line[0].split()) for line in Qp))
To get x
, y
, z
(and Weight
) as columns of a dataframe:要将
x
、 y
、 z
(和Weight
)作为 dataframe 的列:
>>> import pandas
>>> import re
>>>
>>> with open('out.test','r') as f:
... for i, line in enumerate(f,1):
... m = re.search('MESH : (\d+) unique points', line)
... if m:
... break
...
>>> i
6
>>> m.group(1)
'4'
>>> df = pd.read_csv('out.test', skiprows=i, nrows=int(m.group(1))+1, sep=r"\s+")
>>> df
x y z Weight
1 0.000 0.0 0.0 0.3906
2 0.125 0.0 0.0 0.7812
3 0.250 0.0 0.0 0.7812
4 0.375 0.0 0.0 0.7812
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.