字符串Python列表中没有项目

Question

I have a list of things I want to filter out of a csv, and I'm trying to figure out a pythonic way to do it. 我有一些要从csv中过滤掉的内容，并且我正在尝试找出一种用Python编写的方法。 EG, this is what I'm doing: EG，这就是我正在做的：

with open('output.csv', 'wb') as outf:
    with open('input.csv', 'rbU') as inf:
         read = csv.reader(inf)
         outwriter = csv.writer(outf)
         notstrings = ['and', 'or', '&', 'is', 'a', 'the']
         for row in read:
             (if none of notstrings in row[3])
                 outwriter(row)

I don't know what to put in the parentheses (or if there's a better overall way to go about this). 我不知道在括号中放什么（或者是否有更好的整体方法可以解决此问题）。

Answer 1

You can use the any() function to test each of the words in your list against a column: 您可以使用any()函数针对列对列表中的每个单词进行测试：

if not any(w in row[3] for w in notstrings):
    # none of the strings are found, write the row

This will be true if none of those strings appear in row[3] . 如果这些字符串均未出现在row[3]则为true。 It'll match substrings , however, so false-positive would be a match for 'a' in 'false-positive for example. 这将匹配子，然而， false-positive将是一个匹配'a' in 'false-positive的例子。

Put into context: 放在上下文中：

with open('output.csv', 'wb') as outf:
    with open('input.csv', 'rbU') as inf:
        read = csv.reader(inf)
        outwriter = csv.writer(outf)
        notstrings = ['and', 'or', '&', 'is', 'a', 'the']
        for row in read:
            if not any(w in row[3] for w in notstrings):
                outwriter(row)

If you need to honour word boundaries then a regular expression is going to be a better idea here: 如果您需要遵守单词边界，那么在这里使用正则表达式将是一个更好的主意：

notstrings = re.compile(r'(?:\b(?:and|or|is|a|the)\b)|(?:\B&\B)')
if not notstrings.search(row[3]):
    # none of the words are found, write the row

I created a Regex101 demo for the expression to demonstrate how it works. 我为该表达式创建了Regex101演示，以演示其工作原理。 It has two branches: 它有两个分支：

\\b(?:and|or|is|a|the)\\b - matches any of the words in the list provided they are at the start, end, or between non-word characters (punctuation, whitespace, etc.) \\b(?:and|or|is|a|the)\\b匹配列表中的任何单词，前提是它们位于开头，结尾或非单词字符之间（标点，空格等）
\\B&\\B - matches the & character if at the start, end, or between non-word characters. \\B&\\B如果在开头，结尾或非单词字符之间与&字符匹配。 You can't use \\b here as & is itself not a word character. 您不能在此处使用\\b ，因为&本身不是单词字符。

Answer 2

You can use sets. 您可以使用集合。 In this code, I transform your list into a set. 在这段代码中，我将您的列表转换为一个列表。 I transform your row[3] into a set of words and I check the intersection between the two sets. 我将您的row[3]转换为一组单词，然后检查这两组单词之间的交集。 If there is not intersection, that means none of the words in notstrings are in row[3] . 如果没有交集，则表示notstring中的所有单词都不在row[3] 。

Using sets, you make sure that you match only words and not parts of words. 使用集合，可以确保仅匹配单词，而不匹配单词的一部分。

with open('output.csv', 'wb') as outf:
    with open('input.csv', 'rbU') as inf:
        read = csv.reader(inf)
        outwriter = csv.writer(outf)
        notstrings = set(['and', 'or', '&', 'is', 'a', 'the'])
        for row in read:
            if not notstrings.intersection(set(row[3].split(' '))):
                outwriter(row)

字符串Python列表中没有项目

问题描述

2 个解决方案

解决方案1
2 已采纳 2015-04-06 17:19:42

解决方案2
1 2015-04-06 17:22:54

字符串Python列表中没有项目

问题描述

2 个解决方案

解决方案1 2 已采纳 2015-04-06 17:19:42

解决方案2 1 2015-04-06 17:22:54

解决方案1
2 已采纳 2015-04-06 17:19:42

解决方案2
1 2015-04-06 17:22:54