简体   繁体   English

在txt文件中搜索字符串/否则打印不存在

[英]Search txt file for string / else print not present

I am having an issue where I'm trying to write a program that combs a config file for 'certain' search terms and if they match, print "it's there", if not print "it's not here". 我遇到一个问题,我试图编写一个程序来组合“某些”搜索词的配置文件,如果它们匹配,则打印“它在那儿”,如果不打印“它不在这里”。 Here is what I have so far: 这是我到目前为止的内容:

import sys
import fnmatch
import re

check = ["test1", "test2", "test3"]

 for f in filter(os.path.isfile, sys.argv[1:]): ##open doc arg
    for line in open(f).readlines(): ##loop for reading line by line
        if re.match(check[0], line): ##match at beginning for check
            print(check[0], "is in place") ##print if match == true
        elif re.search(check[0], line): ##if not check search (full file)
            print(check[0], "is not in place") ##print if true
    for line in open(f).readlines():
        if re.match(check[1], line):
            print(check[1], "is in place")
        elif ((re.search(check[1], line)) == None):
            print(check[1], "is not in place")

So the issue is, if I print an else-statement, then every line (all 1500) prints since the loop runs line by line. 因此,问题是,如果我打印else语句,则由于循环逐行运行,因此每行(共1500条)都会打印。 Is there a way to search the whole doc and not line by line? 有没有办法搜索整个文档而不是逐行搜索?

Yes, this is possible, using read() . 是的,可以使用read() But beware that if your file is huge, it may not be a good idea to load the entire file at once in your memory. 但是请注意,如果文件很大,一次将整个文件加载到内存中可能不是一个好主意。

Also you are looping through the same file multiple times, try to avoid this by only iterating over the file once and searching all the values in the check array at once. 另外,您要多次遍历同一文件,请尝试仅遍历该文件一次并一次搜索check数组中的所有值,从而避免这种情况。 Furthermore try to avoid using regexes whenever possible since they can be slow. 此外,尝试避免使用正则表达式,因为它们可能很慢。 Something like this can work too: 这样的事情也可以工作:

for line in open(f).readlines():
    for check_value in check:
        if check_value in line:
            print "{} is in place.".format(check_value)

Use the else clause of the for loop along with the break statement. for循环的else子句与break语句一起使用。 Also note that just iterating over the file itself will do; 另请注意,仅遍历文件本身即可。 no need to explicitly read all the lines. 无需显式读取所有行。 (I also added with to make sure the file gets closed.) (我还添加with以确保关闭文件。)

with open(f) as infile:
    for line in infile:
        if re.match(check[0], line):
            print(check[0], "is in place")
            break     # stop after finding one match
    else:             # we got to the end of the file without a match
        print(check[0], "is not in place")

You can even write it as one of those ever-popular generator expressions: 您甚至可以将其写为那些受欢迎的生成器表达式之一:

with open(f) as infile:
    if any(re.match(check[0], line) for line in infile):
        print(check[0], "is in place")
    else:
        print(check[0], "is not in place")

Since the messages being printed are so similar, you can code-golf it even further: 由于要打印的消息是如此相似,因此您可以对它进行进一步的代码编码:

with open(f) as infile:
    print(check[0], "is" if any(re.match(check[0], line) for line in infile) else "is not", "in place")

To read the entire file, you can use read() instead of readlines() . 要读取整个文件,可以使用read()代替readlines()

with open(f) as fil:
    lines = fil.read()

If what you're looking for in the file is just a string, you don't need re : 如果您在文件中寻找的只是一个字符串,则不需要re

if check[0] in lines:
    print(check[0], "is in place")

I guess you can read the file to a string and use a simple if x in... , ie: 我想您可以将文件读取为字符串,并使用简单的if x in... ,即:

with open("text_contains.txt") as f:
    text =  f.read().lower() # remove .lower() for caseSensiTive matching
for x in ["test1", "test2", "test3"]:
    if x in text:
        print("{} is in place".format(x))
    else:
        print("{} is not in place".format(x))

If you really need to read the file line by line (I assume you need the line of the occurrence), then: 如果您确实需要逐行读取文件(我假设您需要出现该行),则:

import sys
import fnmatch
import re

searchTerms = ["test1", "test2", "test3"]
occurrences = {}

# Initialise occurrences list for each term:

for term in searchTerms:
    occurrences[term] = []

# Read line by line and check if any of the terms is present in that specific
# line. If it is, save the occurrence.

for f in filter(os.path.isfile, sys.argv[1:]):
    for line in open(f).readlines():
        for term in searchTerms:
            if re.match(term, line):
                occurrences[term].append(line)

# For each term, print all the lines with occurrences, if any, or 'not found'
# otherwise:

for term in searchTerms:
    if len(occurrences[term]) > 0:
        print("'%s' found in lines: %s" % ", ".join(occurrences[term]))
    else:
        print("'%s' not found" % term)

However, if you just need to check if the term is there, regardless of the line, just use read to read the whole file at once: 但是,如果您只需要检查术语是否存在,则无论使用read行,都可以使用read一次读取整个文件:

for f in filter(os.path.isfile, sys.argv[1:]):
    with open(f) as file:
        text = file.read()

        for term in searchTerms:
            if re.match(term, text):
                print("'%s' found" % term)
            else:
                print("'%s' not found" % term)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM