[英]Pandas.read_csv(), how to read every character as a new element
I have a huge text file(MyTextFile.txt) containing characters like this : ("\\n" refers to the line breaker)我有一个巨大的文本文件(MyTextFile.txt)包含这样的字符:(“\\n”指的是换行符)
ABCDE\n
FGHIJ\n
KLMNO\n
using pandas.read_csv('MyTextFile.txt')
returns a 3x1 array, each element contains 5 characters.使用
pandas.read_csv('MyTextFile.txt')
返回一个 3x1 数组,每个元素包含 5 个字符。 But I need a 15x1 array ([A,B,C,D,E,F,G,H,I,J,K,L,M,N,O] , line breaker should be ignored), is there a simple way to achieve this ?但是我需要一个 15x1 的数组([A,B,C,D,E,F,G,H,I,J,K,L,M,N,O] ,应该忽略换行符),是否有一个简单的实现这一目标的方法?
there are about 250 million characters in a file, and I have 25 files to read, so the efficiency of doing this could be quite critical to me一个文件中大约有 2.5 亿个字符,而我有 25 个文件要读取,因此这样做的效率对我来说可能非常关键
Thanks.谢谢。
You could use:你可以使用:
# Open the file
file = open('example.txt', 'r')
# Create your results
res = []
# Edited from https://www.geeksforgeeks.org/python-program-to-read-character-by-character-from-a-file/
while 1:
# read by character
char = file.read(1)
# If youre out of characters
if not char:
break
# If not, add the character to the list, but don't include breaking spaces
elif char != '\n':
res.append(char)
# Close your file object
file.close()
# Print out the results
print(res)
Yields: ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']
产量:
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.