简体   繁体   English

如何查询我已输入的csv文件的特定列并使用python打印所有返回的行?

[英]How can i query a specific column of a csv file that i have input and print all returned rows using python?

So to walk you through it, this is what i want to do 因此,逐步引导您,这就是我要做的

1) I want to place the script in the folder with the csv i want to analyze 1)我想将脚本放置在要分析的csv文件夹中

2) Run the script 2)运行脚本

3) Enter the name of the .csv I want to analyze 3)输入我要分析的.csv名称

4) Enter the words and phrases I want to search for separated by a comma 4)输入要搜索的单词和短语,以逗号分隔

5) Search and print the rows that contain any of the words/phrases i have specified 5)搜索并打印包含我指定的任何单词/短语的行

Ok, so here is my code 好,这是我的代码

import csv


opening_text = "Make sure this script is in the same folder as file you want to analyze \n"
print opening_text

file_name = raw_input('Enter file name ending with .csv to analyze (e.g. file.csv): ')


print "\n The file that will be analyzed is " + file_name + "\n"

my_terms = raw_input('Please enter the words and phrases you would like to find in ' + file_name + ', separated by a comma:')


single_terms= my_terms.split(',')
with open(file_name, 'rb') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
    for row in spamreader:
        for term in single_terms:
            if term in row:
                print ' '.join(row)

The current script i have has these issues: 我当前的脚本存在以下问题:

1) It's not searching for phrases. 1)它不是在搜索短语。 It can search 'hey' and 'there' separately but not 'hey there' 它可以分别搜索“嘿”和“那里”,但不能搜索“那里”

2) it does not sanitize my input. 2)它没有清理我的输入。 For example, my i delineate my terms with a comma followed by space, but if the next phrase I want to search for is at the beginning of a sentence, it does not search for it correctly. 例如,我用逗号后跟空格来划定我的字词,但是如果我要搜索的下一个短语是句子的开头,则它不会正确搜索它。

3) if the search term has a different case from file content, it gives incorrect results 3)如果搜索词与文件内容的大小写不同,则会给出错误的结果

Also, is there any way i can search only one column in my csv file? 另外,有什么方法可以在csv文件中仅搜索一列? eg just searching the "Comments" column. 例如,仅搜索“评论”列。

Here is my sample data contained in "sample.csv" which i have in the same folder as the script. 这是“ sample.csv”中包含的我的示例数据,我与脚本位于同一文件夹中。

Sample Data 样本数据

Date;Customer Name;Comments

2/12/2015;Eric;The apples were absolutely delicious

3/10/2015;Tasha;I enjoyed the mangoes thoroughly

4/11/2014;Walter;The mangoes were awesome

3/10/2009;Ben;Who killed the cat really

9/10/2088;Lisa;Eric recommended guavas for me

For the described case, you probably do not need regular expressions; 对于上述情况,您可能不需要正则表达式; simple string search would do. 简单的字符串搜索即可。 However, let's have a look at both versions. 但是,让我们看一下两个版本。

First of all, you used a space ' ' as delimiter, which is incorrect for your provided CSV data. 首先,您使用空格' '作为分隔符,这对于您提供的CSV数据是不正确的。 For correct parsing, you want to use ';' 为了正确解析,您想使用';' as a delimiter. 作为分隔符。 In your example case, the quotechar does not have any effect, so you can omit it or set it to something common. 在您的示例案例中,quotechar没有任何作用,因此您可以忽略它或将其设置为常见的东西。

For both versions below, I use the following: 对于以下两个版本,我都使用以下内容:

file = 'sampledata/test.csv' # Target CSV file path
terms = 'enjoy, apples, the mangoes' # You want to replace this with your input

Version 1: String search 版本1:字符串搜索

lookup = [i.strip().lower() for i in terms.split(',')]
with open(file, 'r') as csvin:
    rdr = csv.reader(csvin, delimiter=';', quotechar='"')
    header = rdr.next()
    for row in rdr:
        for l in lookup:
            if row[header.index('Comments')].lower().find(l) != -1:
                print(row)

To help you through it, here are the basic steps: 为了帮助您完成此过程,以下是基本步骤:

  1. Transform the input terms to something usable. 将输入terms转换为可用的项。 I split it at commas, as you wrote in your code. 正如您在代码中所写的那样,我将其以逗号分隔。 In addition, strip() the spaces, as they would prevent you from finding something at the beginning of a comment. 另外,请strip()空格,因为它们会阻止您在注释的开头找到某些内容。

  2. Read file, set CSV-reader and draw the header from the first line. 读取文件,设置CSV阅读器并从第一行绘制标题。

  3. For each row and each element in our lookup list, we test whether the lookup is present somewhere in the string. 对于查询列表中的每一行和每个元素,我们测试查询是否存在于字符串中。 I use lower() to ignore case, especially at the beginning of comments. 我使用lower()忽略大小写,尤其是在注释开头。

The result for my exemplarily chosen input terms is: 我示例性选择的输入项的结果是:

['2/12/2015', 'Eric', 'The apples were absolutely delicious']
['3/10/2015', 'Tasha', 'I enjoyed the mangoes thoroughly']
['3/10/2015', 'Tasha', 'I enjoyed the mangoes thoroughly']
['4/11/2014', 'Walter', 'The mangoes were awesome']

Note: One comment is returned twice, because two of our lookup elements have been found in the text. 注意:一个注释将返回两次,因为在文本中找到了我们的两个查找元素。 You cannot directly avoid this, but you can handle it ex post. 您不能直接避免这种情况,但是可以事后处理。

Version 2: Regular expressions 版本2:正则表达式

The majority of the example above remains the same. 上面的大多数示例都相同。 Here is the code: 这是代码:

lookup = [re.compile(i.strip().lower()) for i in terms.split(',')]
with open(file, 'r') as csvin:
    rdr = csv.reader(csvin, delimiter=';', quotechar='"')
    header = rdr.next()
    for row in rdr:
        for l in lookup:
            m = l.search(row[header.index('Comments')].lower())
            if m is not None:
                print(row)

The difference is in steps 1 and 3: 区别在于步骤1和3:

  1. For each input term, we compile a regular expression and store it in our lookup list. 对于每个输入项,我们编译一个正则表达式并将其存储在我们的查找列表中。 Note: In my example terms, regular expressions fall back to some regular string search, because no special regex operators are used. 注意:在我的示例术语中,由于不使用特殊的正则表达式运算符,因此正则表达式可以回溯到某些正则字符串搜索。 You can however input something like mango(es)? 但是,您可以输入类似mango(es)? .

  2. (same as above) (同上)

  3. For each row and each regex-lookup, test the comment column of your CSV using re.search() , which yields a regex match object re.MatchObject . 对于每一行和每个正则表达式查找,请使用re.search()测试CSV的注释列,这将产生一个正则表达式匹配对象re.MatchObject If the resulting object is not None , you have found a match. 如果结果对象不是None ,则找到匹配项。 Note: access the position of the found substring using start() method of your match object. 注意:使用匹配对象的start()方法访问找到的子字符串的位置。 For more functions, see docs on Regex Match Objects 有关更多功能,请参见正则表达式匹配对象上的文档。

The result of the regex version is the same as above: 正则表达式版本的结果与以上相同:

['2/12/2015', 'Eric', 'The apples were absolutely delicious']
['3/10/2015', 'Tasha', 'I enjoyed the mangoes thoroughly']
['3/10/2015', 'Tasha', 'I enjoyed the mangoes thoroughly']
['4/11/2014', 'Walter', 'The mangoes were awesome']

Additionally... 另外...

You asked whether you can only search one column. 您询问是否只能搜索一列。 If you obtain a row from the csv reader, it gives a list of strings as split by the provided delimiter. 如果您从csv阅读器中获得一行,则它将提供由提供的定界符分割的字符串列表。 To get a specific column by its name, you can use the index() function on the initially drawn header row and then use the returned index to access the element in the row's list. 要按名称获取特定列,可以在最初绘制的标题行上使用index()函数,然后使用返回的索引访问该行列表中的元素。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何使用 Python 从 csv 文件中标记特定列中的所有行? - How can i tokenize all rows in a specific column from a csv file using Python? 如何在csv文件中的行中打印特定字段,以及如何将输入内容写入csv文件? - How do I print specific fields from rows in a csv file and how to write input to a csv file? 使用python csv根据csv文件中特定列的不同值打印与另一列中的最小值相关的所有行 - Print all rows related to minimum values from another column based on distinct values of a specific column from csv file using python csv 如何打印其中包含特定关键字的csv文件行 - How do I print rows of a csv file that have a specific keyword in them 如何在 Python 中的 csv 文件的特定列中测试条件 - How can I test a conditional in a specific column of a csv file in Python 如何使用 python 的 guppy 打印所有行 - How can I print all rows using python's guppy 我在 .csv 文件的特定列中有 MongoDB 格式的字符串行 如何将其转换为数据帧? - I have rows of strings in the MongoDB format in a specific column in the .csv file How do I convert it into dataframe? Python-如何使用CSV文件中的循环查找同一列的两行之间的差异? - Python - How can I find difference between two rows of same column using loop in CSV file? 如何使用python将列表或字符串写入特定列的csv文件? - How I can write a list or string to a specific column csv file by using python? 我有熊猫数组,最初是一个csv文件。 我想从该列的所有行中删除一个特定的单词:text - I have panda array which was originally a csv file. I would like to remove a specific word from all the rows in the column: text
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM