简体   繁体   English

如何在Python中动态预处理文本流?

[英]How to preprocess a text stream on the fly in Python?

What I need is a Python 3 function (or whatever) that would take a text stream (like sys.stdin or like that returned by open(file_name, "rt") ) and return a text stream to be consumed by some other function but remove all the spaces, replace all tabs with commas and convert all the letters to lowercase on the fly (the "lazy" way) as the data is read by the consumer code. 我需要的是一个Python 3函数(或其他),它将采用文本流(如sys.stdin或类似open(file_name, "rt") )返回的函数,并返回一个文本流,供其他函数使用但删除所有空格,用逗号替换所有选项卡,并在消费者代码读取数据时动态地将所有字母转换为小写(“懒惰”方式)。

I assume there is a reasonably easy way to do this in Python 3 like something similar to list comprehensions but don't know what exactly might it be so far. 我假设在Python 3中有一种相当简单的方法可以像列表推导类似,但不知道到目前为止它究竟是什么。

I am not sure this is what you mean, but the easiest way i can think of is to inherit from file (the type returned from open) and override the read method to do all the things you want after reading the data. 我不确定这是什么意思,但我能想到的最简单的方法是继承文件(从open返回的类型)并覆盖read方法,以便在读取数据后执行所需的所有操作。 A simple implementation would be: 一个简单的实现是:

class MyFile(file):
    def read(*args, **kwargs):
         data = super().read(*args,**kwargs)
         # process data eg. data.replace(' ',' ').replace('\t', ',').lower()
         return data

I believe what you are looking for is the io module, more specifically a io.StringIO . 我相信你要找的是io模块,更具体地说是io.StringIO

You can then use the open() method to get the initial data and modify, then pass it around: 然后,您可以使用open()方法获取初始数据并进行修改,然后传递它:

with open(file_name, 'rt') as f:
    stream = io.StringIO(f.read().replace(' ','').replace('\t',',').lower())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用python预处理Twitter文本数据 - How to preprocess twitter text data using python 如何在Python中动态流式生成gzip? - How to stream a gzip built on the fly in Python? 如何预处理图像以去除噪声并提取文本 Python? - How to preprocess an image to remove noise and extract text Python? 如何在Python 3中将文本流编码为字节流? - How to encode a text stream into a byte stream in Python 3? AttributeError: 'NoneType' object has no attribute 'lower' in Python。如何在标记文本内容之前进行预处理? - AttributeError: 'NoneType' object has no attribute 'lower' in Python. How to preprocess before tokenizing the text content? 如何在Python中预处理时间序列数据以进行预测 - How to preprocess time series data in Python for forecasting Python mrjob mapreduce如何预处理输入文件 - Python mrjob mapreduce how to preprocess the input file 如何优化预处理所有文本文档而不使用for循环在每次迭代中预处理单个文本文档? - How to optimize preprocess all text documents without using for loop to preprocess a single text document in each iteration? 如何从 ZipFile 流式传输? 如何“即时”压缩? - How to stream from ZipFile? How to zip "on the fly"? 如何“动态”区分文件和输出流? - How to diff file and output stream “on-the-fly”?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM