如何在 Python 中的文本中提取行的特定部分

Question

I have a huge file that I splitted in a series of lines with the function text.splitlines() .我有一个巨大的文件，我用 function text.splitlines()将其拆分为一系列行。 From these lines I need to specifically extract some informations corresponding to a keyword: "ref-p".从这些行中，我需要专门提取一些与关键字相对应的信息：“ref-p”。 What I did is:我所做的是：

for index, line in enumerate(tpr_linee):
    ref = "ref-p"
    a = []
    if ref in line:

        a.append(line)

        print(a)

what I obtained is:我得到的是：

  1  ['   ref-p (3x3):']
  2  ['      ref-p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}']
  3  ['      ref-p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}']
  4  ['      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}']

now I need to move the three series of number into a dictionary in the form:现在我需要将三个系列的数字以如下形式移动到字典中：

{ref-p: [[number, number, number], [number, number, number], etc]} . {ref-p: [[number, number, number], [number, number, number], etc]} 。

Also, in the larger dataset the array 3x3 may be a different shape in different files.此外，在较大的数据集中，数组 3x3 在不同文件中可能是不同的形状。

So my main goal is to find a way to extract all the numbers corresponding to ref-p , taking only the numbers and ignoring the first appearance of ref-p key.所以我的主要目标是找到一种方法来提取与ref-p对应的所有数字，只取数字并忽略ref-p键的第一次出现。

Answer 1

I have edited the first part of your code, so that the list a will contain a list of strings to be analysed.我已经编辑了您的代码的第一部分，因此列表a将包含要分析的字符串列表。

Then I split each string based on "=" (equal) sign, and strip the curly braces "{" and "}" to extract only the string of numbers.然后我根据“=”（等号）拆分每个字符串，并去掉花括号“{”和“}”以仅提取数字字符串。

When converting to float, the numbers are just 0.0 and 1.0.转换为浮点数时，数字只有 0.0 和 1.0。 Try this:尝试这个：

a = []
for index, line in enumerate(tpr_linee):
    if 'ref-p' in line:
        a.append(line)
print(a)

a = ['   ref-p (3x3):', 
     '      ref-p[    0]={ 1.00000e+00,  0.00000e+00,  0.00000e+00}', 
     '      ref-p[    1]={ 0.00000e+00,  1.00000e+00,  0.00000e+00}', 
     '      ref-p[    2]={ 0.00000e+00,  0.00000e+00,  1.00000e+00}' ]

result = {'ref-p': []}
for strg in a:
    if '=' in strg:
        num_list = strg.split('=')[-1].strip('{').strip('}').split(',')
        print(num_list)
        result['ref-p'].append([float(e.strip()) for e in num_list])
print(result)

Output Output

[' 1.00000e+00', '  0.00000e+00', '  0.00000e+00']
[' 0.00000e+00', '  1.00000e+00', '  0.00000e+00']
[' 0.00000e+00', '  0.00000e+00', '  1.00000e+00']
{'ref-p': [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0], [0.0, 0.0, 1.0]]}

Answer 2

Try this:尝试这个：

import ast 

out = []
for index, line in enumerate(tpr_linee):
    ref = "ref-p"
    if ref in line:
        try:
            line1 = line.split('=')[1].replace('{', '(').replace('}', ')')
            line1 = ast.literal_eval(line1)
            out.append(line1)
        except:
            continue
print(out)

[(1.0, 0.0, 0.0), (0.0, 1.0, 0.0), (0.0, 0.0, 1.0)]

如何在 Python 中的文本中提取行的特定部分

问题描述

2 个解决方案

解决方案1
2 已采纳 2022-08-18 12:19:12

解决方案2
0 2022-08-19 17:00:12

如何在 Python 中的文本中提取行的特定部分

问题描述

2 个解决方案

解决方案1 2 已采纳 2022-08-18 12:19:12

解决方案2 0 2022-08-19 17:00:12

解决方案1
2 已采纳 2022-08-18 12:19:12

解决方案2
0 2022-08-19 17:00:12