简体   繁体   English

用于文本的简单过滤器Python脚本

[英]Simple Filter Python script for Text

I am trying to create what must be a simple filter function which runs a regex against a text file and returns all words containing that particular regex. 我正在尝试创建必须是一个简单的过滤器函数的函数,该函数针对文本文件运行一个正则表达式并返回包含该特定正则表达式的所有单词。

so for example if i wanted to find all words that contained "abc", and I had the list: abcde , bce , xyz and zyxabc the script would return abcde and zyxabc . 因此,例如,如果我想找到包含“ABC”的所有文字,我有名单: abcdebcexyzzyxabc脚本将返回abcdezyxabc

I have a script below however I am not sure if it is just the regex I am failing at or not. 我在下面有一个脚本,但是我不确定它是否只是我失败的正则表达式。 it just returns abc twice rather than the full word. 它只会返回abc两次,而不是完整的单词。 thanks. 谢谢。

import re

text = open("test.txt", "r")
regex = re.compile(r'(abc)')

for line in text:
    target = regex.findall(line)
    for word in target:
        print word

I think you dont need regex for such task you can simply split your lines to create a list of words then loop over your words list and use in operator : 我认为您不需要正则表达式来执行此任务,您只需split行即可创建单词列表,然后遍历单词列表并in operator中使用:

 with open("test.txt") as f :
     for line in f:
         for w in line.split():
              if 'abc' in w :
                   print w 

Your methodology is correct however, you can change your Regex to r'.*abc.*' , in the sense 您的方法正确,但是您可以将Regex更改为r'.*abc.*'

 regex = re.compile(r'.*abc.*')

This will match all the lines with abc in them The wildcards .*` will match all your letters in the line. 这将匹配其中所有带有abc的行The wildcards 。*`将匹配该行中的所有字母。

A small Demo with that particular line changed would print 更改了特定行的小演示将打印

abcde
zyxabc

Note, As Kasra mentions it is better to use in operator in such cases 注意,正如Kasra所提到的 ,在这种情况下最好in运算符中使用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM