简体   繁体   中英

How can i query a specific column of a csv file that i have input and print all returned rows using python?

So to walk you through it, this is what i want to do

1) I want to place the script in the folder with the csv i want to analyze

2) Run the script

3) Enter the name of the .csv I want to analyze

4) Enter the words and phrases I want to search for separated by a comma

5) Search and print the rows that contain any of the words/phrases i have specified

Ok, so here is my code

import csv


opening_text = "Make sure this script is in the same folder as file you want to analyze \n"
print opening_text

file_name = raw_input('Enter file name ending with .csv to analyze (e.g. file.csv): ')


print "\n The file that will be analyzed is " + file_name + "\n"

my_terms = raw_input('Please enter the words and phrases you would like to find in ' + file_name + ', separated by a comma:')


single_terms= my_terms.split(',')
with open(file_name, 'rb') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
    for row in spamreader:
        for term in single_terms:
            if term in row:
                print ' '.join(row)

The current script i have has these issues:

1) It's not searching for phrases. It can search 'hey' and 'there' separately but not 'hey there'

2) it does not sanitize my input. For example, my i delineate my terms with a comma followed by space, but if the next phrase I want to search for is at the beginning of a sentence, it does not search for it correctly.

3) if the search term has a different case from file content, it gives incorrect results

Also, is there any way i can search only one column in my csv file? eg just searching the "Comments" column.

Here is my sample data contained in "sample.csv" which i have in the same folder as the script.

Sample Data

Date;Customer Name;Comments

2/12/2015;Eric;The apples were absolutely delicious

3/10/2015;Tasha;I enjoyed the mangoes thoroughly

4/11/2014;Walter;The mangoes were awesome

3/10/2009;Ben;Who killed the cat really

9/10/2088;Lisa;Eric recommended guavas for me

For the described case, you probably do not need regular expressions; simple string search would do. However, let's have a look at both versions.

First of all, you used a space ' ' as delimiter, which is incorrect for your provided CSV data. For correct parsing, you want to use ';' as a delimiter. In your example case, the quotechar does not have any effect, so you can omit it or set it to something common.

For both versions below, I use the following:

file = 'sampledata/test.csv' # Target CSV file path
terms = 'enjoy, apples, the mangoes' # You want to replace this with your input

Version 1: String search

lookup = [i.strip().lower() for i in terms.split(',')]
with open(file, 'r') as csvin:
    rdr = csv.reader(csvin, delimiter=';', quotechar='"')
    header = rdr.next()
    for row in rdr:
        for l in lookup:
            if row[header.index('Comments')].lower().find(l) != -1:
                print(row)

To help you through it, here are the basic steps:

  1. Transform the input terms to something usable. I split it at commas, as you wrote in your code. In addition, strip() the spaces, as they would prevent you from finding something at the beginning of a comment.

  2. Read file, set CSV-reader and draw the header from the first line.

  3. For each row and each element in our lookup list, we test whether the lookup is present somewhere in the string. I use lower() to ignore case, especially at the beginning of comments.

The result for my exemplarily chosen input terms is:

['2/12/2015', 'Eric', 'The apples were absolutely delicious']
['3/10/2015', 'Tasha', 'I enjoyed the mangoes thoroughly']
['3/10/2015', 'Tasha', 'I enjoyed the mangoes thoroughly']
['4/11/2014', 'Walter', 'The mangoes were awesome']

Note: One comment is returned twice, because two of our lookup elements have been found in the text. You cannot directly avoid this, but you can handle it ex post.

Version 2: Regular expressions

The majority of the example above remains the same. Here is the code:

lookup = [re.compile(i.strip().lower()) for i in terms.split(',')]
with open(file, 'r') as csvin:
    rdr = csv.reader(csvin, delimiter=';', quotechar='"')
    header = rdr.next()
    for row in rdr:
        for l in lookup:
            m = l.search(row[header.index('Comments')].lower())
            if m is not None:
                print(row)

The difference is in steps 1 and 3:

  1. For each input term, we compile a regular expression and store it in our lookup list. Note: In my example terms, regular expressions fall back to some regular string search, because no special regex operators are used. You can however input something like mango(es)? .

  2. (same as above)

  3. For each row and each regex-lookup, test the comment column of your CSV using re.search() , which yields a regex match object re.MatchObject . If the resulting object is not None , you have found a match. Note: access the position of the found substring using start() method of your match object. For more functions, see docs on Regex Match Objects

The result of the regex version is the same as above:

['2/12/2015', 'Eric', 'The apples were absolutely delicious']
['3/10/2015', 'Tasha', 'I enjoyed the mangoes thoroughly']
['3/10/2015', 'Tasha', 'I enjoyed the mangoes thoroughly']
['4/11/2014', 'Walter', 'The mangoes were awesome']

Additionally...

You asked whether you can only search one column. If you obtain a row from the csv reader, it gives a list of strings as split by the provided delimiter. To get a specific column by its name, you can use the index() function on the initially drawn header row and then use the returned index to access the element in the row's list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM