简体   繁体   English

如何在 Python 中的文本中提取行的特定部分

[英]How to extract specific part of a line in a text in Python

I have a huge file that I splitted in a series of lines with the function text.splitlines() .我有一个巨大的文件,我用 function text.splitlines()将其拆分为一系列行。 From these lines I need to specifically extract some informations corresponding to a keyword: "ref-p".从这些行中,我需要专门提取一些与关键字相对应的信息:“ref-p”。 What I did is:我所做的是:

for index, line in enumerate(tpr_linee):
    ref = "ref-p"
    a = []
    if ref in line:

        a.append(line)

        print(a)

what I obtained is:我得到的是:

  1  ['   ref-p (3x3):']
  2  ['      ref-p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}']
  3  ['      ref-p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}']
  4  ['      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}']

now I need to move the three series of number into a dictionary in the form:现在我需要将三个系列的数字以如下形式移动到字典中:

{ref-p: [[number, number, number], [number, number, number], etc]} . {ref-p: [[number, number, number], [number, number, number], etc]}

Also, in the larger dataset the array 3x3 may be a different shape in different files.此外,在较大的数据集中,数组 3x3 在不同文件中可能是不同的形状。

So my main goal is to find a way to extract all the numbers corresponding to ref-p , taking only the numbers and ignoring the first appearance of ref-p key.所以我的主要目标是找到一种方法来提取与ref-p对应的所有数字,只取数字并忽略ref-p键的第一次出现。

I have edited the first part of your code, so that the list a will contain a list of strings to be analysed.我已经编辑了您的代码的第一部分,因此列表a将包含要分析的字符串列表。

Then I split each string based on "=" (equal) sign, and strip the curly braces "{" and "}" to extract only the string of numbers.然后我根据“=”(等号)拆分每个字符串,并去掉花括号“{”和“}”以仅提取数字字符串。

When converting to float, the numbers are just 0.0 and 1.0.转换为浮点数时,数字只有 0.0 和 1.0。 Try this:尝试这个:

a = []
for index, line in enumerate(tpr_linee):
    if 'ref-p' in line:
        a.append(line)
print(a)

a = ['   ref-p (3x3):', 
     '      ref-p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}', 
     '      ref-p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}', 
     '      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}' ]

result = {'ref-p': []}
for strg in a:
    if '=' in strg:
        num_list = strg.split('=')[-1].strip('{').strip('}').split(',')
        print(num_list)
        result['ref-p'].append([float(e.strip()) for e in num_list])
print(result)

Output Output

[' 1.00000e+00', '  0.00000e+00', '  0.00000e+00']
[' 0.00000e+00', '  1.00000e+00', '  0.00000e+00']
[' 0.00000e+00', '  0.00000e+00', '  1.00000e+00']
{'ref-p': [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]}

Try this:尝试这个:

import ast 

out = []
for index, line in enumerate(tpr_linee):
    ref = "ref-p"
    if ref in line:
        try:
            line1 = line.split('=')[1].replace('{', '(').replace('}', ')')
            line1 = ast.literal_eval(line1)
            out.append(line1)
        except:
            continue
print(out)

[(1.0, 0.0, 0.0), (0.0, 1.0, 0.0), (0.0, 0.0, 1.0)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM