读取文件中的行，如果包含字符串则打印行

Question

I have a working code that opens a file, looks for a string, and prints the line if it contains that string. 我有一个工作代码，可以打开一个文件，查找一个字符串，并在包含该字符串的情况下打印该行。 I'm doing this so that I can decide, manually, whether the line should be removed from my dataset or not. 我这样做是为了可以手动决定是否应从数据集中删除该行。

But it would be much better if I can tell the program to read the part of the line that contains the string that is between two commas. 但是，如果我可以告诉程序读取包含两个逗号之间的字符串的行部分，那就更好了。

The code I have now (see below) 我现在拥有的代码（见下文）

with open("dvd.txt") as f:
    for num, line in enumerate(f, 1):
        if " arnold " in line:
            num = str(num)
            print line + '' + num

Prints each line like this: 像这样打印每一行：

77.224998664,2014-10-19,386.5889,the best arnold ***** ,81,dvd-action,Cheese 5gr,online-dvd-king93,0.19976,18,/media/removable/backup/2014-10-19/all_items/cheese-5gr?feedback_page=1.html,    ships from: Germany    ships to: Worldwide  ,2014-07-30,online-dvd-king,93 1

I'd like it to print this instead: 我希望它打印出来：

,the best arnold ***** , 1

or 要么

the best arnold *****  1

I read this question, but I hope to avoid using CSV. 我读了这个问题，但我希望避免使用CSV。

If it is for whatever reason tricky to find the text between commas, or any other specific characters, it'd be useful to print the 3 words before and after the string I'm looking for. 如果由于某种原因而难以在逗号或其他任何特定字符之间查找文本，则在要查找的字符串前后打印这3个字会很有用。

Answer 1

This is very simple to do with str.split() . 使用str.split()非常简单。 Modifying your function as follows will produce the output you want. 如下修改函数将产生所需的输出。

with open("dvd.csv") as f:
    for num, line in enumerate(f, 1):
        if " arnold " in line:
            num = str(num)
            print line.split(',')[3] + '' + num

str.split splits up a string into a list by the specified separator. str.split通过指定的分隔符将字符串分成列表。 To access the list entry you want, simply supply the appropriate index (which in your case should be 3). 要访问所需的列表条目，只需提供适当的索引（在您的情况下为3）。

As an aside, you can produce your output with the str.format() method to make it a little nicer: str.format() ，您可以使用str.format()方法产生输出，以使其更加str.format() ：

print "{} {}".format(line.split(',')[3], num)

This will also allow you to remove num = str(num) since the format method can handle multiple datatypes (as opposed to string concatenation which cannot). 这也使您可以删除num = str(num)因为format方法可以处理多种数据类型（与不能进行字符串连接的情况相反）。

Answer 2

As an alternative, you could make use of a regular expression as follows: 或者，您可以使用如下正则表达式：

with open("dvd.txt") as f:
    for num, line in enumerate(f, 1):
        re_arnold = re.search(r',\s*([^,]*?arnold[^,]*?)\s*,', line)

        if re_arnold:
            print '{} {}'.format(re_arnold.group(1), num)

This would then extract the whole entry (between the commas) regardless of which field it is in. 然后，这将提取整个条目（逗号之间），而不管其位于哪个字段中。

读取文件中的行，如果包含字符串则打印行

问题描述

2 个解决方案

解决方案1
10 已采纳 2015-11-11 19:02:03

解决方案2
3 2015-11-11 19:16:22

读取文件中的行，如果包含字符串则打印行

问题描述

2 个解决方案

解决方案1 10 已采纳 2015-11-11 19:02:03

解决方案2 3 2015-11-11 19:16:22

解决方案1
10 已采纳 2015-11-11 19:02:03

解决方案2
3 2015-11-11 19:16:22