简体   繁体   中英

How do I load specific rows from a .txt file in Python?

Say I have a .txt file with many rows and columns of data and a list containing integer values. How would I load the row numbers in the text file which match the integers in the list?

To illustrate, say I have a list of integers:

a = [1,3,5]

How would I read only rows 1,3 and 5 from a text file into an array?

The loadtxt routine in numpy let's you both skip rows and use particular columns. But I can't seem to find a way to do something along the lines of (ignoring incorrect syntax):

new_array = np.loadtxt('data.txt', userows=a, unpack='true')

Thank you.

Given this file:

1,2,3
4,5,6
7,8,9
10,11,12
13,14,15
16,17,18
19,20,21

You can use the csv module to get the desired np array:

import csv
import numpy as np

desired=[1,3,5]
with open('/tmp/test.csv', 'r') as fin:
    reader=csv.reader(fin)
    result=[[int(s) for s in row] for i,row in enumerate(reader) if i in desired]

print(np.array(result))   

Prints:

[[ 4  5  6]
 [10 11 12]
 [16 17 18]]

Just to expand on my comment

$ cat file.txt
line 0
line 1
line 2
line 3
line 4
line 5
line 6
line 7
line 8
line 9
line 10

Python:

#!/usr/bin/env python

a = [1, 4, 8]

with open('file.txt') as fd:
    for n, line in enumerate(fd):
        if n in a:
            print line.strip()

output:

$ ./l.py 
line 1
line 4
line 8

You can stick to using numpy's loadtxt method, except that you'll need to pass a generator object to the function instead of the file path.

First define a generator that accepts filename and row indices and yields only those lines at the specified indices

def generate_specific_rows(filePath, userows=[]):
    with open(filePath) as f:
        for i, line in enumerate(f):
            if i in userows:
                yield line

Now you can pass create a generator object and pass it to the loadtxt method

a = [1,3,5]
gen = generate_specific_rows('data.txt', userows=a)
new_array = np.loadtxt(gen, unpack='true')

Use CSV module and Files.xreadlines() .

  • CSV module : implements classes to read and write tabular data in CSV format

  • Files.xreadlines() : Return an iterator over the keys of the dictionary. This is a shortcut for iterkeys(). Deprecated since version 2.3: Use for line in file instead.

I would suggest to use line.split () instead of line.strip() . line.split () returns the list, which can be easily converted to numpy.array by using np.asarray command.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM