简体   繁体   中英

Python- Reading whole txt file as 1 line

If I have a txt file and it contains something like this:

AGCGTTGATAGTGCAGCCATTGCAAAACTTCACCCTA
AGCGTTGATAGTGCAGCCATTGCAAAACTTCACCCTA
AAGAAACGAGTATCAGTAGGATGCAGACGGTTGATTG   

But there are "\\n" between lines.
And now if I want to make triplets out of them. Then is there a way to read the whole txt file as a line so it wouldn't give me:

'CAA', 'TGC', '\nAG', 'CGT', 'TGA', 'TAG', 'TGC', 'AGC',   

I uploaded my whole code I have at the moment because none of the given answers seemed to help.
That's the code I'm using to split the whole string into triplets:

fob = open("Exercise.txt", "r")
def read_from_file(filename): 
    raw_txt = filename.read()
    triplets = [raw_txt[i:i+3] for i in range(0, len(raw_txt), 3)]
read_from_file(fob)
raw_txt = ''.join(line.rstrip('\n') for line in f.readlines())

或者像@PM 2Ring建议的那样:

raw_txt = ''.join(f.read().splitlines())

You don't need to call readlines, just iterate over the file obejct rstripping each line:

with open("test.txt") as f:
    line = "".join([line.rstrip() for line in f])

Or combine it with map :

with open("test.txt") as f:
    line = "".join(list(map(str.rstrip,f)))

rstrip will also take care of whatever your line endings are, there is no need to pass any arguments.

If you want the slices just call iter on the joined string and zip:

line = iter("".join(list(map(str.rstrip, f))))
for sli in zip(line, line, line):
     print("".join(sli))

If you have data that was not a multiple of 3 and you did not want to lose it, you could use itertools.zip_longets:

from itertools import zip_longest
with open("test.txt") as f:
    line = iter("".join(list(map(str.rstrip, f))))
    for sli in zip_longest(line,line,line, fillvalue=""):
        print("".join(sli))

On your sample input both will output:

AGC
GTT
GAT
AGT
GCA
GCC
ATT
GCA
AAA
CTT
CAC
CCT
AAG
CGT
TGA
TAG
TGC
AGC
CAT
TGC
AAA
ACT
TCA
CCC
TAA
AGA
AAC
GAG
TAT
CAG
TAG
GAT
GCA
GAC
GGT
TGA
TTG

Just read the whole file and remove new lines:

with open('file') as f:
    text = f.read().replace('\n', '')
    triplets = [text[i:i+3] for i in range(0, len(text), 3)]

You could also avoid reading the whole file into the memory and read from it iteratively while selecting triplets. You could even make this very lazy by using generator functions and function composition (this makes it very functional):

def getCharacters (fileName):
    with open(fileName) as f:
        for line in f:
            yield from line.rstrip()

def getTriplets (source):
    it = [iter(source)] * 3
    for triplet in zip(*it):
        yield ''.join(triplet)

# and get a list of triplets
triplets = list(getTriplets(getCharacters('file'))

I dont know whether I have solved the question, but do test my code.

I have just modified your code.

As you mentioned in some comments you want to strip newlines in the middle of the file.

So for this I didn't stripped it but I replaced '\\n' with '', using

rtxt = raw_txt.replace('\n', '')

here is the code :

fob = open("Exercise.txt", "r")
def read_from_file(filename): 
    raw_txt = filename.read()
    rtxt = raw_txt.replace('\n', '')
    triplets = [rtxt[i:i+3] for i in range(0, len(rtxt), 3)]
    print triplets
read_from_file(fob)

The Output in the triplets list :

['AGC', 'GTT', 'GAT', 'AGT', 'GCA', 'GCC', 'ATT', 'GCA', 'AAA', 'CTT', 'CAC', 'CCT', 'AAG', 'CGT', 'TGA', 'TAG', 'TGC', 'AGC', 'CAT', 'TGC', 'AAA', 'ACT', 'TCA', 'CCC', 'TAA', 'AGA', 'AAC', 'GAG', 'TAT', 'CAG', 'TAG', 'GAT', 'GCA', 'GAC', 'GGT', 'TGA', 'TTG']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM