[英]Read from file one element at a time Python
I have a file that is not structured on a line-by-line basis, but rather in groups of different sizes that wrap to the next line. 我有一个不是逐行构建的文件,而是包含在下一行的不同大小的组。 I won't go into more detail since it doesn't really matter.
我不会详细介绍,因为它并不重要。 Suffice to say lines don't mean anything structurally.
我只想说线条在结构上没有任何意义。
My question is this: is there a way to read from a file element-by-element, rather than line-by-line? 我的问题是:有没有办法逐个元素地读取文件,而不是逐行读取? I'm pretty sure it's unpythonic to not do line-by-line, but I'd rather not have to read each line and concatenate it with the previous line and then process that.
我很确定不逐行进行单调,但我不必阅读每一行并将其与前一行连接,然后处理它。 If there's a simple way to read each element at a time it would make things a lot easier.
如果有一种简单的方法可以一次读取每个元素,那么事情会变得更容易。 Sorry if this has been asked before, I really couldn't find anything.
对不起,如果之前有人询问,我真的找不到任何东西。 Thanks!
谢谢!
EDIT: I'll add a simple example 编辑:我将添加一个简单的例子
file looks like this: 文件看起来像这样:
1.00 3 4.3 5.6 2.3 4 12.4 0.5 10.2 1.10 8
5.9 11.2 7.3 1.20 8 0.2 1.2 4.2 11 23.1 4.0
7.3 13 4.4 1.7 0.5 (etc.)
The groups start with 1.00, 1.10, 1.20 (always increase by 0.1) 这些组以1.00,1.10,0.20开始(总是增加0.1)
If the numbers don't span record breaks then I think that this can be done more simply. 如果数字没有跨越记录中断,那么我认为这可以更简单地完成。 This is your data.
这是你的数据。
1.00 3 4.3 5.6 2.3 4 12.4 0.5 10.2 1.10 8
5.9 11.2 7.3 1.20 8 0.2 1.2 4.2 11 23.1 4.0
7.3 13 4.4 1.7 0.5
Here's the code. 这是代码。
from decimal import Decimal
def records(currentTime=Decimal('1.00')):
first = True
with open('sample.txt') as sample:
for line in sample.readlines():
for number in line.split():
if Decimal(number) == currentTime:
if first:
first = False
else:
yield record
record = [number]
currentTime += Decimal('0.1')
else:
record.append(number)
yield record
for record in records():
print (record)
Here's the output. 这是输出。
['1.00', '3', '4.3', '5.6', '2.3', '4', '12.4', '0.5', '10.2']
['1.10', '8', '5.9', '11.2', '7.3']
['1.20', '8', '0.2', '1.2', '4.2', '11', '23.1', '4.0', '7.3', '13', '4.4', '1.7', '0.5']
EDIT: This version operates on the same lines but does not assume that numbers cannot span record breaks. 编辑:此版本在相同的行上运行,但不假设数字不能跨越记录中断。 It uses stream I/O.
它使用流I / O. The main thing you would change would be the size of the gulps of data and, of course, the source.
你要改变的主要是数据的大小,当然还有来源。
from decimal import Decimal
from io import StringIO
sample = StringIO('''1.00 3 4.3 5.6 2.3 4 12.4 0.5 10.2 1.10 8 \n5.9 11.2 7.3 1.20 8\n.15 0.2 1.2 4.2 11 23.1 4.0 \n7.3 13 4.4 1.7 0.5''')
def records(currentTime=Decimal('1.00')):
first = True
previousChunk = ''
exhaustedInput = False
while True:
chunk = sample.read(50)
if not chunk:
exhaustedInput = True
chunk = previousChunk
else:
chunk = (previousChunk + chunk).replace('\n', '')
items = chunk.split()
for number in items[:len(items) if exhaustedInput else -1]:
if Decimal(number) == currentTime:
if first:
first = False
else:
yield record
record = [number]
currentTime += Decimal('0.1')
else:
record.append(number)
if exhaustedInput:
yield record
break
else:
previousChunk = chunk.split()[-1]
for record in records():
print (record)
Here is the output. 这是输出。
['1.00', '3', '4.3', '5.6', '2.3', '4', '12.4', '0.5', '10.2']
['1.10', '8', '5.9', '11.2', '7.3']
['1.20', '8.15', '0.2', '1.2', '4.2', '11', '23.1', '4.0', '7.3', '13', '4.4', '1.7', '0.5']
A generator solution using a custom header method. 使用自定义标头方法的生成器解决方案。 Loosely based on https://stackoverflow.com/a/16260159/47078 .
松散地基于https://stackoverflow.com/a/16260159/47078 。
Input: 输入:
' 1.00 3 4.3 5.6\n 2.3\n 4 12.4 0.5 10.2 1.10 8 5.9 11.2\n 7.3 1.20 8 0.2 1.2\n 4.2 11 23.1 4.0\n 7.3\n 13 4.4 1.7 0.5'
Output: 输出:
['1.00', '3', '4.3', '5.6', '2.3', '4', '12.4', '0.5', '10.2']
['1.10', '8', '5.9', '11.2', '7.3']
['1.20', '8', '0.2', '1.2', '4.2', '11', '23.1', '4.0', '7.3', '13', '4.4', '1.7', '0.5']
Source: 资源:
#!/usr/bin/env python3
from contextlib import suppress
from functools import partial
# yields strings from a file based on custom headers
#
# f a file like object supporting read(size)
# index_of_next_header a function taking a string and returning
# the position of the next header or raising
# (default = group by newline)
# chunk_size how many bytes to read at a time
def group_file_by_custom_header(f,
index_of_next_header=lambda buf: buf.index('\n') + 1,
chunk_size=10):
buf = ''
for chunk in iter(partial(f.read, chunk_size), ''):
buf += chunk
with suppress(ValueError):
while True:
pos = index_of_next_header(buf)
yield buf[:pos]
buf = buf[pos:]
if buf:
yield buf
# Pass an empty list to data
def index_of_next_timestamp(buf, data):
def next_timestamp(buf):
next_ts = buf.strip().split(maxsplit=2)
if len(next_ts) < 2:
raise ValueError()
return '{:4.2f}'.format(float(next_ts[0]) + 0.1)
if not data:
data.append(next_timestamp(buf))
pos = buf.index(data[0])
data[0] = next_timestamp(buf[pos:])
return pos
def get_dummy_file():
import io
data = ' 1.00 3 4.3 5.6\n 2.3\n 4 12.4 0.5 10.2 1.10 8 5.9 11.2\n 7.3 1.20 8 0.2 1.2\n 4.2 11 23.1 4.0\n 7.3\n 13 4.4 1.7 0.5'
return io.StringIO(data)
data_file = get_dummy_file()
header_fn = partial(index_of_next_timestamp, data=[])
for group in group_file_by_custom_header(data_file, header_fn):
print(repr(group.split()))
I don't know why this didn't occur to me before. 我不知道为什么以前没有发生这种情况。 You can read more-or-less element by element using a lexical scanner.
您可以使用词法扫描程序逐个元素地读取元素。 I've used the one that comes with Python, namely shlex.
我使用了Python附带的那个,即shlex。 It has the virtue that it will operate on a stream input, unlike some of the more popular ones, I understand.
它的优点是它可以在流输入上运行,不像一些比较流行的,我明白。 This seems even simpler.
这似乎更简单。
from io import StringIO
sample = StringIO('''1.00 3 4.3 5.6 2.3 4 12.4 0.5 10.2 1.10 8 \n5.9 11.2 7.3 1.20 8\n.15 0.2 1.2 4.2 11 23.1 4.0 \n7.3 13 4.4 1.7 0.5''')
from shlex import shlex
lexer = shlex(instream=sample, posix=False)
lexer.wordchars = r'0123456789.\n'
lexer.whitespace = ' '
lexer.whitespace_split = True
from decimal import Decimal
def records(currentTime=Decimal('1.00')):
first = True
while True:
token = lexer.get_token()
if token:
token = token.strip()
if not token:
break
else:
break
token = token.replace('\n', '')
if Decimal(token) == currentTime:
if first:
first = False
else:
yield record
currentTime += Decimal('0.1')
record = [float(token)]
else:
record.append(float(token))
yield record
for record in records():
print (record)
Output is: 输出是:
[1.0, 3.0, 4.3, 5.6, 2.3, 4.0, 12.4, 0.5, 10.2]
[1.1, 8.0, 5.9, 11.2, 7.3]
[1.2, 8.15, 0.2, 1.2, 4.2, 11.0, 23.1, 4.0, 7.3, 13.0, 4.4, 1.7, 0.5]
If it were me, I'd write generator-function wrappers to provide precisely the level of detail required: 如果是我,我会编写生成器函数包装器来提供所需的详细程度:
def by_spaces(fp):
for line in fp:
for word in line.split():
yield word
def by_numbers(fp):
for word in by_spaces(fp):
yield float(word)
def by_elements(fp):
fp = by_numbers(fp)
start = next(fp)
result = [start]
for number in fp:
if abs(start+.1-number) > 1e-6:
result += [number]
else:
yield result
result = [number]
start = number
if result:
yield result
with open('x.in') as fp:
for element in by_elements(fp):
print (element)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.