[英]reading a txt file and use some specific portion of it to get an array
I am trying to read a txt file that is a mix of string and float like this: 我正在尝试读取一个由字符串和浮点数组成的txt文件,如下所示:
n_rows=55; #This describes the mask array below, not the experiment!!
n_cols=32;
# Note that 'columns' run down and rows run across!
mask = [
/*RC1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 */
/* 0 */ 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,1,
/* 1 */ 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,
/* 2 */ 1,0,0,1,1,1,1,0,0,0, 0,0,0,0,0,0,1,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,1,0,0,1,0,0, 1,0,0,0,0,0,0,0,0,0, 0,0,0,1,0,
/* 3 */ 0,0,0,1,1,0,1,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,1,0,1, 0,0,0,0,1,1,1,1,1,0, 1,0,1,1,1,
/* 4 */ 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,
/* 5 */ 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,1,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,1,
/* 6 */ 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,1,0, 0,0,0,0,0,0,0,1,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,
/* 7 */ 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0, 1,1,0,0,1,
/*RC2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 */
The only thing I want is the numbers after /* n */ that I can finally get a matrix consisting 1 and 0. There are 32 rows in total (example file only shows 8 rows) and there are useless lines in between them. 我唯一想要的是/ * n * /后面的数字,我最终可以得到一个由1和0组成的矩阵。共有32行(示例文件仅显示8行),并且它们之间没有任何用处。
I tried some pretty dumb way of doing this: 我尝试了一些非常愚蠢的方法:
txtlines = tuple(open(filename, 'r')) #read files so that each whole line in txt file become an element of a list)
txtlines=list(txtlines)
import re
pattern = re.compile("/*[0-31]*/") #set a pattern to remove unwanted lines
gen = [i for i in txtlines if pattern.match(i)==None] # The useless element
lines_cut = [x for x in txtlines if x not in gen]
I planned to cut off '/* n */' within each element later and change each the elements to be a 1d array of [0,1,0,1,0,0,0,...] and append all of them to be a 2d array. 我计划稍后在每个元素中截断'/ * n * /'并将每个元素更改为[0,1,0,1,0,0,0,...]的1d数组并附加所有他们是一个二维数组。
There are two problems so far: 到目前为止,存在两个问题:
I didn't successfully cut all of the useless element by that pattern, the line like /*RC2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 */ remains 我没有通过该模式成功剪切所有无用的元素,例如/ * RC2 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 * /保持
After cutting gen from lines, the order of remaining elements is totally changed, initially the first element is the one with / 0 / but not it's / 25 /. 从行上剪下gen后,其余元素的顺序会完全改变,最初的第一个元素是带有/ 0 /的元素,但不是/ 25 /。 But i really need the order to be persistent.
但是我真的需要命令保持持久性。
I kinda solved problem 2 by changing the list to array, and then remove 我通过将列表更改为数组来解决问题2,然后删除
array=np.asarray(txtlines)
gen_array=np.asarray(gen)
array_cut=[x for x in array if x not in gen_array]
It seems working but I am not quite sure if I'm doing the correct thing. 似乎可行,但我不确定自己是否做对了。
Your regular expression is not correct. 您的正则表达式不正确。 You need to escape '*' and instead of [0-31] you need [0-9]+, ie one or more digits.
您需要转义'*',而不是[0-31],而需要[0-9] +,即一位或多位数字。 For example,
例如,
import re
import numpy as np
def get_line(filename):
pattern = re.compile('^/\* *[0-9]+ *\*/(.*)')
with open(filename, 'r') as file:
for line in file:
m = re.match(pattern, line)
if m:
yield m.group(1).strip(', ').split(',')
m = np.matrix([l for l in get_line(filename)])
a = []
with open("tonparr.txt","r") as f:
for line in f:
if line[0:3] == "/* ":
a.append(line[8:-1])
b= []
for x in range(0,len(a)):
b.append([])
for i in a[x].split(","):
if i.isdigit():
b[x].append(int(i))
produces a 2D array of each line as int arrays, from there you just need to convert each to a numpy array. 生成每行的2D数组作为int数组,从那里您只需要将其转换为numpy数组即可。 Sorry if I interpreted the question incorrectly.
抱歉,如果我错误地解释了这个问题。
result = []
with open('data.txt') as data:
for line in data:
if not line.startswith('/* '): continue
pieces = line.split('*/')
result.append(pieces[1].strip().replace(' ', ''))
for row in result:
print (row)
You are looking for lines that open with '/* '
. 您正在寻找以
'/* '
开头的行。 When you find one split it on '*/'
and retain the right-hand piece in the 'result` array minus the blanks. 当您找到一个拆分时,将其拆分为
'*/'
,并将右侧片段保留在“结果”数组中,减去空白。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.