简体   繁体   English

从 Python 中的文本文件中读取多行变量

[英]Reading multiline variables from a text file in Python

I am reading in a text file into python.我正在将文本文件读入python。 The text file contains a (large) number of variables with the variable name given as a string on the left and the value on the right, separated by an equals sign (=).文本文件包含(大量)变量,变量名称在左侧以字符串形式给出,在右侧以等号 (=) 分隔值。 For example例如

Proc_Method = 2
Obs_Method = 4

So long as the value of the variable is given in a single line I am able to read out the value of the variable correctly with:只要变量的值在一行中给出,我就可以正确读出变量的值:

    namevalue = data.split('=') 
    name = namevalue[0].strip()
    value = namevalue[1].strip()

However, if the variable is spread over multiple lines (ie an array).但是,如果变量分布在多行(即数组)上。 This code only assigns the FIRST row of the array to the variable before moving on to the next variable.此代码仅将数组的第一行分配给变量,然后再转到下一个变量。 So if I had a variable of the following form:因此,如果我有以下形式的变量:

Corr_Mat = 10 0 0 
           20 10 0
           0 20 10  

the above code would state that value equaled 10 0 0 and then move on to the next variable.上面的代码将声明该值等于 10 0 0,然后转到下一个变量。 Is there a way I can define value so that it takes ALL the lines starting with the equal sign, and finishes at the line before the next equality in the text file?有没有一种方法可以定义值,以便它采用以等号开头的所有行,并在文本文件中下一个相等之前的行结束?

With a file like this:使用这样的文件:

Proc_Method = 2
Obs_Method = 4
Corr_Mat = 10 0 0 
           20 10 0
           0 20 10
Proc_Method = 2

Option 1选项1

You can follow a stack-like approach that puts new variables as keys into the dictionary variables and appends all following lines as value to the key.您可以遵循类似堆栈的方法,将新变量作为键放入字典variables中,并将所有以下行作为值附加到键中。

variables = {}

with open('file.txt') as f:
    for line in f:
        if '=' in line:
            # add a new key-value pair to the dictionary,
            # remember the current variable name
            current_variable, value = line.split('=')
            variables[current_variable] = value
        else:
            # it's a continued line -> add to the remembered key
            variables[current_variable] += line

Result:结果:

{'Proc_Method ': ' 2\n',
 'Obs_Method ': ' 4\n',
 'Corr_Mat ': ' 10 0 0 \n           20 10 0\n           0 20 10\n'}

Option 2选项 2

Alternatively, you can read the file as a whole and use a regular expression to extract the variables.或者,您可以读取整个文件并使用正则表达式来提取变量。

The pattern searches for the start of a line (leading ^ in combination with re.MULTILINE ), followed by an arbitrary number of symbols ( .+ ) followed by = followed by an arbitrary number of symbols that are not equal signs ( [^=]+ ) followed by a newline.该模式搜索行的开头(前导^re.MULTILINE组合),后跟任意数量的符号( .+ )后跟=后跟任意数量的不等号符号( [^=]+ ) 后跟换行符。

import re 

with open('file.txt') as f:
    file_content = f.read()

chunks = re.findall(r'^.+=[^=]+\n', file_content, flags=re.MULTILINE)

Result:结果:

['Proc_Method = 2\n',
 'Obs_Method = 4\n',
 'Corr_Mat = 10 0 0 \n           20 10 0\n           0 20 10\n',
 'Proc_Method = 2\n']

Of course that needs some cleaning and only works if variables don't contain any = s which might not be guaranteed.当然,这需要一些清理,并且只有在变量不包含任何可能无法保证的= s 时才有效。

Try this code.试试这个代码。

params = {}

last_param = ''

for line in data.splitlines():
    
    line = line.strip()
    
    if '=' in line:
        sp = line.split('=')
        last_param = sp[0]
        params[last_param] = sp[1]
    else:
        params[last_param] += f'\n{line}'

print(params)

Result:结果:

{'Proc_Method ': ' 2', 'Obs_Method ': ' 4', 'Corr_Mat ': ' 10 0 0\n20 10 0\n0 20 10'}

I got a cleaner result with this but arguably messier code.我得到了一个更清晰的结果,但可以说是更混乱的代码。 But I guess there are many easy ways to make this code cleaner...但我想有很多简单的方法可以让这段代码更干净......

File Input:文件输入:

Proc_Method = 2
Obs_Method = 4
Corr_Mat = 10 0 0 
           20 10 0
           0 20 10
test = 4
test2 = 4
woof = a b c 
           s sdfasd sdas 
           sda as a 

Code:代码:

output = {}
with open ("E:\Coding\stackoverflow\input.txt", "r") as file:
    lines = file.readlines()
    previous_full_line = 0
    for line_number, line in enumerate(lines):
        line_content = line.strip()
        if "=" in line_content:
            namevalue = line_content.split('=') 
            name = namevalue[0].strip()
            values = namevalue[1].strip()
            values = values.split(" ")
            line_list = []
            for value in values:
                line_list.append(value)
            output[name] = line_list
            previous_full_line = line_number
        else:
            values = line_content.split(" ")
            new_list = []
            for value in values:
                new_list.append(value)
            output[name].extend(new_list)

print(output)

Result结果

{
   "Proc_Method":[
      "2"
   ],
   "Obs_Method":[
      "4"
   ],
   "Corr_Mat":[
      "10",
      "0",
      "0",
      "20",
      "10",
      "0",
      "0",
      "20",
      "10"
   ],
   "test":[
      "4"
   ],
   "test2":[
      "4"
   ],
   "woof":[
      "a",
      "b",
      "c",
      "s",
      "sdfasd",
      "sdas",
      "sda",
      "as",
      "a"
   ]
}

From this output result I guess you can use the data however you want.从这个输出结果我猜你可以随意使用数据。 Knowing that all the attributes under and "key" in the dictionary "output" are in a single array.知道字典“输出”中的所有属性和“键”都在一个数组中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM