简体   繁体   English

读取txt文件并使用其中的某些特定部分获取数组

[英]reading a txt file and use some specific portion of it to get an array

I am trying to read a txt file that is a mix of string and float like this: 我正在尝试读取一个由字符串和浮点数组成的txt文件,如下所示:

n_rows=55;    #This describes the mask array below, not the experiment!!
n_cols=32;
# Note that 'columns' run down and rows run across!

mask = [
/*RC1   0 1 2 3 4 5 6 7 8 9  0 1 2 3 4 5 6 7 8 9  0 1 2 3 4 5 6 7 8 9  0 1 2 3 4 5 6 7 8 9  0 1 2 3 4 5 6 7 8 9  0 1 2 3 4 */
/* 0 */ 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,1,
/* 1 */ 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,
/* 2 */ 1,0,0,1,1,1,1,0,0,0, 0,0,0,0,0,0,1,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,1,0,0,1,0,0, 1,0,0,0,0,0,0,0,0,0, 0,0,0,1,0,
/* 3 */ 0,0,0,1,1,0,1,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,1,0,1, 0,0,0,0,1,1,1,1,1,0, 1,0,1,1,1,
/* 4 */ 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,
/* 5 */ 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,1,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,1,
/* 6 */ 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,1,0, 0,0,0,0,0,0,0,1,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,
/* 7 */ 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 1,1,0,0,1,
/*RC2   0 1 2 3 4 5 6 7 8 9  0 1 2 3 4 5 6 7 8 9  0 1 2 3 4 5 6 7 8 9  0 1 2 3 4 5 6 7 8 9  0 1 2 3 4 5 6 7 8 9  0 1 2 3 4 */

The only thing I want is the numbers after /* n */ that I can finally get a matrix consisting 1 and 0. There are 32 rows in total (example file only shows 8 rows) and there are useless lines in between them. 我唯一想要的是/ * n * /后面的数字,我最终可以得到一个由1和0组成的矩阵。共有32行(示例文件仅显示8行),并且它们之间没有任何用处。

I tried some pretty dumb way of doing this: 我尝试了一些非常愚蠢的方法:

txtlines = tuple(open(filename, 'r'))   #read files so that each whole line in txt file become an element of a list)
txtlines=list(txtlines)

import re
pattern = re.compile("/*[0-31]*/")     #set a pattern to remove unwanted lines


gen = [i for i in txtlines if pattern.match(i)==None] # The useless element
lines_cut = [x for x in txtlines if x not in gen]

I planned to cut off '/* n */' within each element later and change each the elements to be a 1d array of [0,1,0,1,0,0,0,...] and append all of them to be a 2d array. 我计划稍后在每个元素中截断'/ * n * /'并将每个元素更改为[0,1,0,1,0,0,0,...]的1d数组并附加所有他们是一个二维数组。

There are two problems so far: 到目前为止,存在两个问题:

  1. I didn't successfully cut all of the useless element by that pattern, the line like /*RC2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 */ remains 我没有通过该模式成功剪切所有无用的元素,例如/ * RC2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 * /保持

  2. After cutting gen from lines, the order of remaining elements is totally changed, initially the first element is the one with / 0 / but not it's / 25 /. 从行上剪下gen后,其余元素的顺序会完全改变,最初的第一个元素是带有/ 0 /的元素,但不是/ 25 /。 But i really need the order to be persistent. 但是我真的需要命令保持持久性。

I kinda solved problem 2 by changing the list to array, and then remove 我通过将列表更改为数组来解决问题2,然后删除

array=np.asarray(txtlines)
gen_array=np.asarray(gen)
array_cut=[x for x in array if x not in gen_array] 

It seems working but I am not quite sure if I'm doing the correct thing. 似乎可行,但我不确定自己是否做对了。

Your regular expression is not correct. 您的正则表达式不正确。 You need to escape '*' and instead of [0-31] you need [0-9]+, ie one or more digits. 您需要转义'*',而不是[0-31],而需要[0-9] +,即一位或多位数字。 For example, 例如,

import re
import numpy as np

def get_line(filename):
    pattern = re.compile('^/\* *[0-9]+ *\*/(.*)')    
    with open(filename, 'r') as file:
        for line in file:
            m = re.match(pattern, line)
            if m:
                yield m.group(1).strip(', ').split(',')

m = np.matrix([l for l in get_line(filename)])
a = []
with open("tonparr.txt","r") as f:
    for line in f:
        if line[0:3] == "/* ":
            a.append(line[8:-1])

b= []
for x in range(0,len(a)):
    b.append([])
    for i in a[x].split(","):
        if i.isdigit():
            b[x].append(int(i))

produces a 2D array of each line as int arrays, from there you just need to convert each to a numpy array. 生成每行的2D数组作为int数组,从那里您只需要将其转换为numpy数组即可。 Sorry if I interpreted the question incorrectly. 抱歉,如果我错误地解释了这个问题。

result = []
with open('data.txt') as data:
    for line in data:
        if not line.startswith('/* '): continue
        pieces = line.split('*/')
        result.append(pieces[1].strip().replace(' ', ''))
for row in result:
    print (row)

You are looking for lines that open with '/* ' . 您正在寻找以'/* '开头的行。 When you find one split it on '*/' and retain the right-hand piece in the 'result` array minus the blanks. 当您找到一个拆分时,将其拆分为'*/' ,并将右侧片段保留在“结果”数组中,减去空白。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM