[英]No item in list in string Python
I have a list of things I want to filter out of a csv, and I'm trying to figure out a pythonic way to do it. 我有一些要从csv中过滤掉的内容,并且我正在尝试找出一种用Python编写的方法。 EG, this is what I'm doing:
EG,这就是我正在做的:
with open('output.csv', 'wb') as outf:
with open('input.csv', 'rbU') as inf:
read = csv.reader(inf)
outwriter = csv.writer(outf)
notstrings = ['and', 'or', '&', 'is', 'a', 'the']
for row in read:
(if none of notstrings in row[3])
outwriter(row)
I don't know what to put in the parentheses (or if there's a better overall way to go about this). 我不知道在括号中放什么(或者是否有更好的整体方法可以解决此问题)。
You can use the any()
function to test each of the words in your list against a column: 您可以使用
any()
函数针对列对列表中的每个单词进行测试:
if not any(w in row[3] for w in notstrings):
# none of the strings are found, write the row
This will be true if none of those strings appear in row[3]
. 如果这些字符串均未出现在
row[3]
则为true。 It'll match substrings , however, so false-positive
would be a match for 'a' in 'false-positive
for example. 这将匹配子 ,然而,
false-positive
将是一个匹配'a' in 'false-positive
的例子。
Put into context: 放在上下文中:
with open('output.csv', 'wb') as outf:
with open('input.csv', 'rbU') as inf:
read = csv.reader(inf)
outwriter = csv.writer(outf)
notstrings = ['and', 'or', '&', 'is', 'a', 'the']
for row in read:
if not any(w in row[3] for w in notstrings):
outwriter(row)
If you need to honour word boundaries then a regular expression is going to be a better idea here: 如果您需要遵守单词边界,那么在这里使用正则表达式将是一个更好的主意:
notstrings = re.compile(r'(?:\b(?:and|or|is|a|the)\b)|(?:\B&\B)')
if not notstrings.search(row[3]):
# none of the words are found, write the row
I created a Regex101 demo for the expression to demonstrate how it works. 我为该表达式创建了Regex101演示 ,以演示其工作原理。 It has two branches:
它有两个分支:
\\b(?:and|or|is|a|the)\\b
- matches any of the words in the list provided they are at the start, end, or between non-word characters (punctuation, whitespace, etc.) \\b(?:and|or|is|a|the)\\b
匹配列表中的任何单词,前提是它们位于开头,结尾或非单词字符之间(标点,空格等) \\B&\\B
- matches the &
character if at the start, end, or between non-word characters. \\B&\\B
如果在开头,结尾或非单词字符之间与&
字符匹配。 You can't use \\b
here as &
is itself not a word character. \\b
,因为&
本身不是单词字符。 You can use sets. 您可以使用集合。 In this code, I transform your list into a set.
在这段代码中,我将您的列表转换为一个列表。 I transform your
row[3]
into a set of words and I check the intersection between the two sets. 我将您的
row[3]
转换为一组单词,然后检查这两组单词之间的交集。 If there is not intersection, that means none of the words in notstrings are in row[3]
. 如果没有交集,则表示notstring中的所有单词都不在
row[3]
。
Using sets, you make sure that you match only words and not parts of words. 使用集合,可以确保仅匹配单词,而不匹配单词的一部分。
with open('output.csv', 'wb') as outf:
with open('input.csv', 'rbU') as inf:
read = csv.reader(inf)
outwriter = csv.writer(outf)
notstrings = set(['and', 'or', '&', 'is', 'a', 'the'])
for row in read:
if not notstrings.intersection(set(row[3].split(' '))):
outwriter(row)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.