简体繁体 English

在删除或替换指定字符的同时读取文件？

[英]Reading a file while dropping or replacing specified characters?

原文 2017-10-13 18:39:35 0 1 python

I have a large file that contains some NULL characters.我有一个包含一些 NULL 字符的大文件。 I'd like to read this file in Python, as if these NULLs aren't there.我想用 Python 读取这个文件，就好像这些 NULL 不存在一样。 I could read the entire file into an in-memory string and do a str.replace , but this is inefficient, especially given its total size (which can be in the multiple GBs).我可以将整个文件读入内存字符串并执行str.replace ，但这效率低下，尤其是考虑到它的总大小（可能是多个 GB）。

Is there an efficient way to read a file in Python, while dynamically dropping certain characters, or replacing them with others?有没有一种有效的方法可以在 Python 中读取文件，同时动态删除某些字符，或者用其他字符替换它们？

1 个解决方案

Open the file in binary mode and read it in chunks of suitable size.以二进制模式打开文件并以合适大小的块读取它。 Remove from each chunk undesired characters and write the resulting bytes to another file opened for writing.从每个块中删除不需要的字符并将结果字节写入另一个打开用于写入的文件。

This will work for \\x00 bytes, but will certainly fail if it's a text file with utf-8 encoding, where a single letter can take several bytes.这适用于\\x00字节，但如果它是使用 utf-8 编码的文本文件，则肯定会失败，其中单个字母可能需要几个字节。

This can be solved usingcodecs.open .这可以使用codecs.open解决。 The returned file-like object allows you to read approximate number of bytes in the given encoding.返回的类文件对象允许您read给定编码中的近似字节数。

Python：从文件中读取和替换字符串（带有特殊字符）时出错 - Python: Error while reading and replacing String(with special characters) from file

在while循环中将单词中的字符从指定的position替换到末尾 - Replacing characters in a word from the specified position to the end in a while loop

读取文件路径时出现多余字符 - Extra characters while reading file path

阅读文件内容时的奇怪字符 - Weird characters while reading file content

替换文件中的字符 - Replacing characters in a file

在订购/删除变量时读取 CSV？ - Reading in CSV while ordering/dropping variables?

如何在读取文件但删除一个变量然后替换它时删除“\n”字符 - How do I remove the “\n” characters when reading my file but deleting one variable then replacing it

替换现有大文件中的字符 - replacing characters in existing large file

读取具有指定分隔符的文件以换行 - Reading a file with a specified delimiter for newline

正则表达式-在保留数字的同时替换字符 - Regular Expression - Replacing characters while keeping the digits

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python：从文件中读取和替换字符串（带有特殊字符）时出错 - Python: Error while reading and replacing String(with special characters) from file 在while循环中将单词中的字符从指定的position替换到末尾 - Replacing characters in a word from the specified position to the end in a while loop 读取文件路径时出现多余字符 - Extra characters while reading file path 阅读文件内容时的奇怪字符 - Weird characters while reading file content 替换文件中的字符 - Replacing characters in a file 在订购/删除变量时读取 CSV？ - Reading in CSV while ordering/dropping variables? 如何在读取文件但删除一个变量然后替换它时删除“\n”字符 - How do I remove the “\n” characters when reading my file but deleting one variable then replacing it 替换现有大文件中的字符 - replacing characters in existing large file 读取具有指定分隔符的文件以换行 - Reading a file with a specified delimiter for newline 正则表达式-在保留数字的同时替换字符 - Regular Expression - Replacing characters while keeping the digits

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM