简体   繁体   中英

read text file in python and extract specific value in each line?

I have a text file that each line of it is as follows:

 n:1 mse_avg:8.46 mse_y:12.69 mse_u:0.00 mse_v:0.00 psnr_avg:38.86 psnr_y:37.10 psnr_u:inf psnr_v:inf 
 n:2 mse_avg:12.20 mse_y:18.30 mse_u:0.00 mse_v:0.00 psnr_avg:37.27 psnr_y:35.51 psnr_u:inf psnr_v:inf 
    

I need to read each line extract psnr_y and its value in a matrix. does python have any other functions for reading a text file? I need to extract psnr_y from each line. I have a matlab code for this, but I need a python code and I am not familiar with functions in python. could you please help me with this issue? this is the matlab code:

opt = {'Delimiter',{':',' '}};
fid = fopen('data.txt','rt');
nmc = nnz(fgetl(fid)==':');
frewind(fid);
fmt = repmat('%s%f',1,nmc);
tmp = textscan(fid,fmt,opt{:});
fclose(fid);
fnm = [tmp{:,1:2:end}];
out = cell2struct(tmp(:,2:2:end),fnm(1,:),2)

use regular expression

r'psnr_y:([\d.]+)'

on each line read

and extract match.group(1) from the result

if needed convert to float: float(match.group(1))

Since I hate regex, I would suggest:

s = 'n:1 mse_avg:8.46 mse_y:12.69 mse_u:0.00 mse_v:0.00 psnr_avg:38.86 psnr_y:37.10 psnr_u:inf psnr_v:inf \nn:2 mse_avg:12.20 mse_y:18.30 mse_u:0.00 mse_v:0.00 psnr_avg:37.27 psnr_y:35.51 psnr_u:inf psnr_v:inf' 
lst = s.split('\n')
out = []
for line in lst:
  psnr_y_pos = line.index('psnr_y:')
  next_key = line[psnr_y_pos:].index(' ')
  psnr_y = line[psnr_y_pos+7:psnr_y_pos+next_key]
  out.append(psnr_y)
print(out)

out is a list of the values of psnr_y in each line.

You can use regex like below:

import re

with open('textfile.txt') as f:
    a = f.readlines()
    pattern = r'psnr_y:([\d.]+)'
    for line in a:
        print(re.search(pattern, line)[1])

This code will return only psnr_y's value. you can remove [1] and change it with [0] to get the full string like "psnr_y:37.10". If you want to assign it into a list, the code would look like this:

import re

a_list = []

with open('textfile.txt') as f:
    a = f.readlines()
    pattern = r'psnr_y:([\d.]+)'
    for line in a:
        a_list.append(re.search(pattern, line)[1])

For a simple answer with no need to import additional modules, you could try:

rows = []
with open("my_file", "r") as f:
    for row in f.readlines():
        value_pairs = row.strip().split(" ")
        print(value_pairs)
        values = {pair.split(":")[0]: pair.split(":")[1] for pair in value_pairs}
        print(values["psnr_y"])
        rows.append(values)

print(rows)

This gives you a list of dictionaries (basically JSON structure but with python objects). This probably won't be the fastest solution but the structure is nice and you don't have to use regex

import fileinput
import re

for line in fileinput.input():
    row = dict([s.split(':') for s in re.findall('[\S]+:[\S]+', line)])
    print(row['psnr_y'])

To verify,

python script_name.py < /path/to/your/dataset.txt

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM