简体   繁体   中英

Python: Reading Specific Sections of Huge Text File (Possibly with Itertools)

In short, I'm trying to "extract" certain lines (strings) from a text file. But there's more.

I have a rather large text file (100,000 lines, 60 MB). There are chunks of data that are important, and others that are not. There are several hundred of these chunks. There is no pattern, and where one stops, the next one does not necessarily begin.

I have already parsed the file to determine which lines are of interest to me. Right now, I have a dictionary which contains "start" line numbers as keys, and the desired number of consecutive lines afterwards as values. Here:

paired_points =
{51: 7,
 69: 67,
...
 870623: 1730,
 872364: 1801}



len(paired_points) = 
783

I can convert this to explicit "start" and "stop" integers instead (eg, 51 -> 58, 69 -> 136, etc.), but that still doesn't help me.

I'm trying to use islice from itertools, but it's returning a list of islice objects.

from itertools import islice

file = r'575852.roi'

f = open(file, "r")

a = list()

for key in paired_points:
    with open(file) as f:
        try:
            a.append(islice(f, key, key + int(paired_points[key]))) # Start and stop lines

This works in concept - but I need to convert islice objects to strings. I mean, I'm looking for a list of lines (strings) from the text file.

Any help would be greatly appreciated. Thank you in advanced!

SOLUTION

I've solved this myself (to convert lines of interest to strings, then to an array of floats). I actually needed to "sanitize" each line as well -- by splitting the text line into three float values (correlating to (X, Y, Z) coordinates). This is performed with the built-in map() function in the last line, after we have built a list of strings.

f = open(file, "r")
a = f.readlines()
f.close()

ext_pts = list()
for key in paired_points:
    a1 = a[key : key + paired_points[key]]
    ext_pts.append(a1)

ext_pts2 = list(itertools.chain.from_iterable(ext_pts))
ext_pts2 = np.asarray(list(map(sanitize, ext_pts2)))

ext_pts is now an Nx3 numpy array of (X, Y, Z) points.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM