简体   繁体   中英

Searching CSV Files (Python)

I've made this CSV file up to play with.. From what I've been told before, I'm pretty sure this CSV file is valid and can be used in this example.

Basically I have this CSV file 'book_list.csv':

  name,author,year
  Lord of the Rings: The Fellowship of the Ring,J. R. R. Tolkien,1954
  Nineteen Eighty-Four,George Orwell,1984
  Lord of the Rings: The Return of the King,J. R. R. Tolkien,1954
  Animal Farm,George Orwell,1945
  Lord of the Rings: The Two Towers, J. R. R. Tolkien, 1954

And I also have this text file 'search_query.txt', whereby I put in keywords or search terms I want to search for in the CSV file:

  Lord
  Rings
  Animal

I've currently come up with some code (with the help of stuff I've read) that allows me to count the number of matching entries. I then have the program write a separate CSV file 'results.csv' which just returns either 'Matching' or ' '.

The program then takes this 'results.csv' file and counts how many 'Matching' results I have and it prints the count.

import csv
import collections

f1 = file('book_list.csv', 'r')
f2 = file('search_query.txt', 'r')
f3 = file('results.csv', 'w')

c1 = csv.reader(f1)
c2 = csv.reader(f2)
c3 = csv.writer(f3)

input = [row for row in c2]

for booklist_row in c1:
    row = 1
    found = False
    for input_row in input:
        results_row = []
        if input_row[0] in booklist_row[0]:
            results_row.append('Matching')
            found = True
            break
        row = row + 1
    if not found:
        results_row.append('')
    c3.writerow(results_row)

f1.close()
f2.close()
f3.close()

d = collections.defaultdict(int)
with open("results.csv", "rb") as info:
    reader = csv.reader(info)
    for row in reader:
        for matches in row:
            matches = matches.strip()
            if matches:
                d[matches] += 1
    results = [(matches, count) for matches, count in d.iteritems() if count >= 1]
    results.sort(key=lambda x: x[1], reverse=True)
    for matches, count in results:
        print 'There are', count, 'matching results'+'.'

In this case, my output returns:

There are 4 matching results.

I'm sure there is a better way of doing this and avoiding writing a completely separate CSV file.. but this was easier for me to get my head around.

My question is, this code that I've put together only returns how many matching results there are.. how do I modify it in order to return the ACTUAL results as well?

ie I want my output to return:

There are 4 matching results.

Lord of the Rings: The Fellowship of the Ring
Lord of the Rings: The Return of the King
Animal Farm
Lord of the Rings: The Two Towers

As I said, I'm sure there's a much easier way to do what I already have.. so some insight would be helpful. :)

Cheers!

EDIT: I just realized that if my keywords were in lower case, it won't work.. is there a way to avoid case-sensitivity?

  1. Throw away the query file and get your search terms from sys.argv[1:] instead.

  2. Throw away your output file and use sys.stdout instead.

  3. Append matched booklist titles to a result_list. The result_row that you currently have has a rather misleading name. The count that you want is len(result_list) . Print that. Then print the contents of result_list.

  4. Convert your query words to lowercase once (before you start reading the input file). As you read each book_list row, convert its title to lowercase. Do your your matching with the lowercase query words and the lowercase title.

Overall plan:

  1. Read in the entire book list csv into a dictionary of {title: info} .
  2. Read in the questions csv. For each keyword, filter the dictionary:

     [key for key, value in books.items() if "Lord" in key] 

    say. Do what you will with the results.

  3. If you want, put the results in another csv.

If you want to deal with casing issues, try turning all the titles to lowercase ( "FOO".lower() ) when you store them in the dictionary.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM