python: pull lines from text file when first column matches string from list

Question

I have a list ['dog', 'cat', 'snake', 'lizard'] . I want to use this list to extract lines from a text file. My text file is tab delimited with new line characters at the end of each line. Each line has 4 columns, the first being one of the names from my list. The first five lines would look like:

dog     data1     data2    data3
dog     data1     data2    data3
cat     data1     data2    data3
snake   data1     data2    data3
lizard  data1     data2    data3

for many lines.

I want to make a text file for each of the items in my list. In each new file I want every line from the original file where the first column matches the name in the list/new file. This is the code I have written:

filename = "data.txt"
f = open(filename, 'r')

#my list is named Species
for names in Species:
    with open(str(names) + ".txt", 'w') as g:       
        for line in f:
            row = line.split()
            if names == row[0]:
                g.write(row)

I am able to create the text files I wish to write to but nothing is being written to the files. I am getting no error messages. In the end, I would like to be able to extract only some of the columns of data for each line that I am interested in putting into my new text file.

Answer 1

You should be getting an error from trying to write a list directly to a file (not legal in Python):

Python 2.7:

Python 2.7.10 (default, Sep 13 2015, 20:30:50) 
[GCC 5.2.1 20150911] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> with open("test", "w") as f:
...   f.write([1,2,3,4])
... 
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
TypeError: expected a character buffer object
>>>

The write isn't being called, probably because there isn't a line that matches Species[0] . When the top-level for loop is called again on Species[1] , f is already at end-of-file and won't give any more lines. seek to the beginning of the file at the start of the loop:

for name in Species:
    f.seek(0)
    with open(str(names) + ".txt", "w") as g:
        for line in f:
            if line.startswith(name):
                g.write(line)

Alternatively (this is what I'd do) you can scan through f once, and assign each line to the proper animal as you process it:

records = {}
for line in f:
    animal = line.split()[0]
    if not records.get(animal):
        records[animal] = []
    records[animal].append(line)

for animal in records.keys():
    with open("{}.txt".format(animal), "w") as f:
        for line in records[animal]:
            f.write(line)

Answer 2

Here's the updated code!

Species = ['dog', 'cat', 'snake', 'lizard']
filename = "data.txt"
for names in Species:
    with open(str(names) + ".txt", 'w') as g:
        f = open(filename, 'r')
        for line in f:
            row = line.split()
            if names == row[0]:
                g.write(str(row))
        f.close()

You need to add str(row) in g.write() because you can't write arrays to text files.
Reopening "data.txt" seems to fix your problem with the files not getting written to (not quite sure why though Edit: Oh that's why :) )

python: pull lines from text file when first column matches string from list

Question

2 answers

solution1
1 ACCPTED 2016-02-24 21:37:13

solution2
1 2016-02-24 21:43:29

python: pull lines from text file when first column matches string from list

Question

2 answers

solution1 1 ACCPTED 2016-02-24 21:37:13

solution2 1 2016-02-24 21:43:29

solution1
1 ACCPTED 2016-02-24 21:37:13

solution2
1 2016-02-24 21:43:29