简体   繁体   English

Pandas.read_csv(),如何将每个字符读取为新元素

[英]Pandas.read_csv(), how to read every character as a new element

I have a huge text file(MyTextFile.txt) containing characters like this : ("\\n" refers to the line breaker)我有一个巨大的文本文件(MyTextFile.txt)包含这样的字符:(“\\n”指的是换行符)

ABCDE\n
FGHIJ\n
KLMNO\n

using pandas.read_csv('MyTextFile.txt') returns a 3x1 array, each element contains 5 characters.使用pandas.read_csv('MyTextFile.txt')返回一个 3x1 数组,每个元素包含 5 个字符。 But I need a 15x1 array ([A,B,C,D,E,F,G,H,I,J,K,L,M,N,O] , line breaker should be ignored), is there a simple way to achieve this ?但是我需要一个 15x1 的数组([A,B,C,D,E,F,G,H,I,J,K,L,M,N,O] ,应该忽略换行符),是否有一个简单的实现这一目标的方法?

there are about 250 million characters in a file, and I have 25 files to read, so the efficiency of doing this could be quite critical to me一个文件中大约有 2.5 亿个字符,而我有 25 个文件要读取,因此这样做的效率对我来说可能非常关键

Thanks.谢谢。

You could use:你可以使用:

# Open the file
file = open('example.txt', 'r') 
# Create your results
res = []  

# Edited from https://www.geeksforgeeks.org/python-program-to-read-character-by-character-from-a-file/
while 1: 
    # read by character 
    char = file.read(1)           
    # If youre out of characters
    if not char:  
        break
    # If not, add the character to the list, but don't include breaking spaces
    elif char != '\n':
        res.append(char)

# Close your file object
file.close()

# Print out the results
print(res)

Yields: ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']产量: ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM