简体   繁体   English

字符串Python列表中没有项目

[英]No item in list in string Python

I have a list of things I want to filter out of a csv, and I'm trying to figure out a pythonic way to do it. 我有一些要从csv中过滤掉的内容,并且我正在尝试找出一种用Python编写的方法。 EG, this is what I'm doing: EG,这就是我正在做的:

with open('output.csv', 'wb') as outf:
    with open('input.csv', 'rbU') as inf:
         read = csv.reader(inf)
         outwriter = csv.writer(outf)
         notstrings = ['and', 'or', '&', 'is', 'a', 'the']
         for row in read:
             (if none of notstrings in row[3])
                 outwriter(row)

I don't know what to put in the parentheses (or if there's a better overall way to go about this). 我不知道在括号中放什么(或者是否有更好的整体方法可以解决此问题)。

You can use the any() function to test each of the words in your list against a column: 您可以使用any()函数针对列对列表中的每个单词进行测试:

if not any(w in row[3] for w in notstrings):
    # none of the strings are found, write the row

This will be true if none of those strings appear in row[3] . 如果这些字符串均未出现在row[3]则为true。 It'll match substrings , however, so false-positive would be a match for 'a' in 'false-positive for example. 这将匹配 ,然而, false-positive将是一个匹配'a' in 'false-positive的例子。

Put into context: 放在上下文中:

with open('output.csv', 'wb') as outf:
    with open('input.csv', 'rbU') as inf:
        read = csv.reader(inf)
        outwriter = csv.writer(outf)
        notstrings = ['and', 'or', '&', 'is', 'a', 'the']
        for row in read:
            if not any(w in row[3] for w in notstrings):
                outwriter(row)

If you need to honour word boundaries then a regular expression is going to be a better idea here: 如果您需要遵守单词边界,那么在这里使用正则表达式将是一个更好的主意:

notstrings = re.compile(r'(?:\b(?:and|or|is|a|the)\b)|(?:\B&\B)')
if not notstrings.search(row[3]):
    # none of the words are found, write the row

I created a Regex101 demo for the expression to demonstrate how it works. 我为该表达式创建了Regex101演示 ,以演示其工作原理。 It has two branches: 它有两个分支:

  • \\b(?:and|or|is|a|the)\\b - matches any of the words in the list provided they are at the start, end, or between non-word characters (punctuation, whitespace, etc.) \\b(?:and|or|is|a|the)\\b匹配列表中的任何单词,前提是它们位于开头,结尾或非单词字符之间(标点,空格等)
  • \\B&\\B - matches the & character if at the start, end, or between non-word characters. \\B&\\B如果在开头,结尾或非单词字符之间与&字符匹配。 You can't use \\b here as & is itself not a word character. 您不能在此处使用\\b ,因为&本身不是单词字符。

You can use sets. 您可以使用集合。 In this code, I transform your list into a set. 在这段代码中,我将您的列表转换为一个列表。 I transform your row[3] into a set of words and I check the intersection between the two sets. 我将您的row[3]转换为一组单词,然后检查这两组单词之间的交集。 If there is not intersection, that means none of the words in notstrings are in row[3] . 如果没有交集,则表示notstring中的所有单词都不在row[3]

Using sets, you make sure that you match only words and not parts of words. 使用集合,可以确保仅匹配单词,而不匹配单词的一部分。

with open('output.csv', 'wb') as outf:
    with open('input.csv', 'rbU') as inf:
        read = csv.reader(inf)
        outwriter = csv.writer(outf)
        notstrings = set(['and', 'or', '&', 'is', 'a', 'the'])
        for row in read:
            if not notstrings.intersection(set(row[3].split(' '))):
                outwriter(row)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM