简体   繁体   中英

How can I use a coordinate system to get to specific characters in a file, not counting new lines or the first line

So I have a coordinate system that points to a position in a large file.

The first line of the file is variable in length (but always starts with a ">" character) and from there the lines are 50 characters long, then a new line. This can go on for several million lines.

I want to be able to find the characters between, for example, 1,000,000-1,000,050 (which would be input at 1000000-1000050) and write these to a string. How can I seek to that position in the file? I tried using f.seek(1000000), but I run in to the problem of the length of the first line. Even if I add the length of the first line to the 1000000 in the f.seek function, I still get an extra character (the newline) for every 50 characters.

The numbers will rarely be as clean as 1000000-1000050.

line_length=50
char_n=10000000 #zero-based index
count=50

with open('f.txt') as f:
    f.readline()
    start=f.tell()
    f.seek(start+int(char_n/line_length)*(line_length+1)+char_n%line_length)
    print(f.read(count))

This is what I ended up using. It seems to work for the small trial I have used.

#reads input from user for exon coordinates
coords = raw_input("Please enter the coordinates of the Exon you would like to use\n")

#Reads the first part of coords for the chromosome (and, therefore, filename)
chr_index = coords[:coords.index(":")] + ".fa"

#get starting coordinate
coordStart = coords[coords.index(":")+1:coords.index("-")]

#get ending coordinate
coordEnd = coords[coords.index("-")+1:]

#open the file
f = open(chr_index, "r")

f.seek()
lenFirstLine = len(f.readline())

#create string containing the exon sequence
#move to start of the exon
f.seek(lenFirstLine+coordStart+coordstart/50)

#read the number of characters = to the len of the exon into exon
exon = f.read(coordEnd-coordStart)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM