[英]Turn lists of strings into Numpy array (Python)
So I'm trying to extract some data from a text file. 因此,我正在尝试从文本文件中提取一些数据。 Currently I'm able to get the correct lines that contain the data, which in turn gives me an output looking like this: 目前,我能够获得包含数据的正确行,这反过来又给了我如下所示的输出:
[ 0.2 0.148 100. ]
[ 0.3 0.222 100. ]
[ 0.4 0.296 100. ]
[ 0.5 0.37 100. ]
[ 0.6 0.444 100. ]
So basically I have 5 lists with one string in each. 所以基本上我有5个列表,每个列表一个字符串。 However, as you can imagine I would like to get all of this into a numpy array with each string split into the 3 values. 但是,正如您可以想象的那样,我想将所有这些都放入一个numpy数组中,并将每个字符串分成3个值。 Like this: 像这样:
[[0.2, 0.148, 100],
[0.3, 0.222, 100],
[0.4, 0.296, 100],
[0.5, 0.37, 100],
[0.6, 0.444, 100]]
But since the separator in the output is random, ie I don't know if it will be 3 spaces, 5 spaces or a tab, I'm kind of lost in how to do this. 但是由于输出中的分隔符是随机的,即我不知道它是3个空格,5个空格还是一个制表符,所以我对如何执行此操作有些迷失。
UPDATE: 更新:
So the data looks a bit like this: 因此数据看起来像这样:
data_file =
Equiv. Sphere Diam. [cm]: 6.9
Conformity Index: N/A
Gradient Measure [cm]: N/A
Relative dose [%] Dose [Gy] Ratio of Total Structure Volume [%]
0 0 100
0.1 0.074 100
0.2 0.148 100
0.3 0.222 100
0.4 0.296 100
0.5 0.37 100
0.6 0.444 100
0.7 0.518 100
0.8 0.592 100
Uncertainty plan: U1 X:+3.00cm (variation of plan: CT1)
Dose Cover.[%]: 100.0
Sampling Cover.[%]: 100.0
Relative dose [%] Dose [Gy] Ratio of Total Structure Volume [%]
0 0 100
0.1 0.074 100
0.2 0.148 100
0.3 0.222 100
0.4 0.296 100
0.5 0.37 100
0.6 0.444 100
And the code to get the lines is: 获得代码的代码是:
with open(data_file) as input_data:
# Skips text before the beginning of the interesting block:
for line in input_data:
if line.strip() == 'Relative dose [%] Dose [Gy] Ratio of Total Structure Volume [%]': # Or whatever test is needed
break
# Reads text until the end of the block:
for line in input_data: # This keeps reading the file
if line.strip() == 'Uncertainty plan: U1 X:+3.00cm (variation of plan: CT1)':
break
text_line = np.fromstring(line, sep='\t')
print text_line
So the text before the data it self is random, so I can't just say "skip the first 5 lines", but the headers are always the same, and it ends at the same as well (before the next data begins). 因此,数据本身之前的文本是随机的,因此我不能只说“跳过前5行”,而是标题始终相同,并且结尾也相同(在下一个数据开始之前)。 So I just need a way to get out the raw data, put it into a numpy array, and then I can play with it from there. 所以我只需要一种方法来提取原始数据,将其放入一个numpy数组中,然后就可以从那里开始使用它了。
Hopefully it makes more sense now. 希望现在更有意义。
Given a text file called tmp.txt
like this: 给定一个名为tmp.txt
的文本文件,如下所示:
0.2 0.148 100.
0.3 0.222 100.
0.4 0.296 100.
0.5 0.37 100.
0.6 0.444 100.
The snippet: 片段:
with open('tmp.txt', 'r') as in_file:
print [map(float, line.split()) for line in in_file.readlines()]
Will output: 将输出:
[[0.2, 0.148, 100.0], [0.3, 0.222, 100.0], [0.4, 0.296, 100.0], [0.5, 0.37, 100.0], [0.6, 0.444, 100.0]]
Which is your desired one hopefully. 希望是您想要的。
1) Add before with open
: 1)在with open
前添加:
import re
d_input = []
2) replace 2)更换
text_line = np.fromstring(line, sep='\t')
print text_line
to 至
d_input.append([float(x) for x in re.sub('\s+', ',', line.strip()).split(',')])
3) Add at the end: 3)在末尾添加:
d_array = np.array(d_input)
With the print text_line
, you are seeing arrays formatted as strings. 使用print text_line
,您将看到格式化为字符串的数组。 They are formatted individually, so columns don't line up. 它们是单独格式化的,因此列不会对齐。
[ 0.2 0.148 100. ]
[ 0.3 0.222 100. ]
[ 0.4 0.296 100. ]
[ 0.5 0.37 100. ]
[ 0.6 0.444 100. ]
Instead of printing you could collect the values in a list, and concatenate that at the end. 不用打印,您可以将值收集在列表中,并在末尾将其连接起来。
Without actually testing, I think this would work: 如果没有实际测试,我认为这会起作用:
data = []
with open(data_file) as input_data:
# Skips text before the beginning of the interesting block:
for line in input_data:
if line.strip() == 'Relative dose [%] Dose [Gy] Ratio of Total Structure Volume [%]': # Or whatever test is needed
break
# Reads text until the end of the block:
for line in input_data: # This keeps reading the file
if line.strip() == 'Uncertainty plan: U1 X:+3.00cm (variation of plan: CT1)':
break
arr_line = np.fromstring(line, sep='\t')
# may need a test on len(arr_line) to weed out blank lines
data.append(arr_line)
data = np.vstack(data)
Another option is to collect the lines without parsing, and pass them to np.genfromtxt
. 另一种选择是不分析就收集行,并将其传递给np.genfromtxt
。 In other words use your code as a filter to feed the numpy function just the right lines. 换句话说,将您的代码用作过滤器,以将numpy函数提供给正确的行。 It takes input from anything that feeds it lines - a file, a list, a generator. 它从任何行输入中获取输入-文件,列表,生成器。
def filter(input_data):
# Skips text before the beginning of the interesting block:
for line in input_data:
if line.strip() == 'Relative dose [%] Dose [Gy] Ratio of Total Structure Volume [%]': # Or whatever test is needed
break
# Reads text until the end of the block:
for line in input_data: # This keeps reading the file
if line.strip() == 'Uncertainty plan: U1 X:+3.00cm (variation of plan: CT1)':
break
yield line
with open(data_file) as f:
data = np.genfromtxt(filter(f)) # delimiter?
print(data)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.