[英]Simple Filter Python script for Text
I am trying to create what must be a simple filter function which runs a regex against a text file and returns all words containing that particular regex. 我正在尝试创建必须是一个简单的过滤器函数的函数,该函数针对文本文件运行一个正则表达式并返回包含该特定正则表达式的所有单词。
so for example if i wanted to find all words that contained "abc", and I had the list: abcde
, bce
, xyz
and zyxabc
the script would return abcde
and zyxabc
. 因此,例如,如果我想找到包含“ABC”的所有文字,我有名单:
abcde
, bce
, xyz
和zyxabc
脚本将返回abcde
和zyxabc
。
I have a script below however I am not sure if it is just the regex I am failing at or not. 我在下面有一个脚本,但是我不确定它是否只是我失败的正则表达式。 it just returns abc twice rather than the full word.
它只会返回abc两次,而不是完整的单词。 thanks.
谢谢。
import re
text = open("test.txt", "r")
regex = re.compile(r'(abc)')
for line in text:
target = regex.findall(line)
for word in target:
print word
I think you dont need regex for such task you can simply split
your lines to create a list of words then loop over your words list and use in
operator : 我认为您不需要正则表达式来执行此任务,您只需
split
行即可创建单词列表,然后遍历单词列表并in
operator中使用:
with open("test.txt") as f :
for line in f:
for w in line.split():
if 'abc' in w :
print w
Your methodology is correct however, you can change your Regex to r'.*abc.*'
, in the sense 您的方法正确,但是您可以将Regex更改为
r'.*abc.*'
regex = re.compile(r'.*abc.*')
This will match all the lines with abc
in them The wildcards
.*` will match all your letters in the line. 这将匹配其中所有带有
abc
的行The wildcards
。*`将匹配该行中的所有字母。
A small Demo with that particular line changed would print 更改了特定行的小演示将打印
abcde
zyxabc
Note, As Kasra mentions it is better to use in
operator in such cases 注意,正如Kasra所提到的 ,在这种情况下最好
in
运算符中使用
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.