[英]How to extract specific part of a line in a text in Python
I have a huge file that I splitted in a series of lines with the function text.splitlines()
.我有一个巨大的文件,我用 function
text.splitlines()
将其拆分为一系列行。 From these lines I need to specifically extract some informations corresponding to a keyword: "ref-p".从这些行中,我需要专门提取一些与关键字相对应的信息:“ref-p”。 What I did is:
我所做的是:
for index, line in enumerate(tpr_linee):
ref = "ref-p"
a = []
if ref in line:
a.append(line)
print(a)
what I obtained is:我得到的是:
1 [' ref-p (3x3):']
2 [' ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}']
3 [' ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}']
4 [' ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}']
now I need to move the three series of number into a dictionary in the form:现在我需要将三个系列的数字以如下形式移动到字典中:
{ref-p: [[number, number, number], [number, number, number], etc]}
. {ref-p: [[number, number, number], [number, number, number], etc]}
。
Also, in the larger dataset the array 3x3 may be a different shape in different files.此外,在较大的数据集中,数组 3x3 在不同文件中可能是不同的形状。
So my main goal is to find a way to extract all the numbers corresponding to ref-p
, taking only the numbers and ignoring the first appearance of ref-p
key.所以我的主要目标是找到一种方法来提取与
ref-p
对应的所有数字,只取数字并忽略ref-p
键的第一次出现。
I have edited the first part of your code, so that the list a
will contain a list of strings to be analysed.我已经编辑了您的代码的第一部分,因此列表
a
将包含要分析的字符串列表。
Then I split each string based on "=" (equal) sign, and strip the curly braces "{" and "}" to extract only the string of numbers.然后我根据“=”(等号)拆分每个字符串,并去掉花括号“{”和“}”以仅提取数字字符串。
When converting to float, the numbers are just 0.0 and 1.0.转换为浮点数时,数字只有 0.0 和 1.0。 Try this:
尝试这个:
a = []
for index, line in enumerate(tpr_linee):
if 'ref-p' in line:
a.append(line)
print(a)
a = [' ref-p (3x3):',
' ref-p[ 0]={ 1.00000e+00, 0.00000e+00, 0.00000e+00}',
' ref-p[ 1]={ 0.00000e+00, 1.00000e+00, 0.00000e+00}',
' ref-p[ 2]={ 0.00000e+00, 0.00000e+00, 1.00000e+00}' ]
result = {'ref-p': []}
for strg in a:
if '=' in strg:
num_list = strg.split('=')[-1].strip('{').strip('}').split(',')
print(num_list)
result['ref-p'].append([float(e.strip()) for e in num_list])
print(result)
Output Output
[' 1.00000e+00', ' 0.00000e+00', ' 0.00000e+00']
[' 0.00000e+00', ' 1.00000e+00', ' 0.00000e+00']
[' 0.00000e+00', ' 0.00000e+00', ' 1.00000e+00']
{'ref-p': [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]}
Try this:尝试这个:
import ast
out = []
for index, line in enumerate(tpr_linee):
ref = "ref-p"
if ref in line:
try:
line1 = line.split('=')[1].replace('{', '(').replace('}', ')')
line1 = ast.literal_eval(line1)
out.append(line1)
except:
continue
print(out)
[(1.0, 0.0, 0.0), (0.0, 1.0, 0.0), (0.0, 0.0, 1.0)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.