使用Python运行嵌套字符串搜索

Question

我想知道是否有一种简单的方法可以在巨大的文本文件中对字符串进行嵌套搜索？

我有一个文本文件，其中可能包含一行文本以突出显示特定的问题区域。 我正在研究可能的嵌套搜索，但同时也避免了在整个文本文件上针对关联的第二个字符串运行全新的搜索，而是从第一个字符串匹配点继续进行。

例如，如果在文本文件上运行字符串搜索，但是我找到了“问题字符串”，那么我一直在寻找然后进行辅助搜索（最好是继续搜索）（从问题行开始）以找到第二个搜索字符串的第一个匹配项。在我的情况下，第二个搜索字符串将是找到最接近的“ GPS INFO”字符串，然后从文本文件中收集GPS信息（即，下一个连续的GPS字符串到第一个“问题字符串”）。

我希望这是有道理的？！？ 基本上，我想避免对文本文件进行全新的搜索，而是继续从找到第一个字符串的位置继续搜索。

我在下面有一些代码，但这只是找到第一个字符串，如果我要寻找第二个字符串，我通常会开始新的搜索，但这并不能保证我找到下一个连续的字符串。

f = open(file, "r")    
searchlines = f.readlines()
searchstringsProblem = ['BIG Problem Line']
searchstringsGPSLoc = ['GPS INFO']

a = 0
tot = 0
row_num=0 # let it be current row number

while a<len(searchstringsProblem):
    for i, line in enumerate(searchlines):
        for word in searchstringsProblem:
            if word in line:
                prob = line.split()
                worksheet.write(2,0,"Problem ID:", bold) 
                worksheet.write(2,1,prob[5]) 
                break
    a = a+1

这是GPS INFO行的示例以及我希望收集的以下统计信息

Key line2 GPS Info

GPS = Active

Longitude = -0.00000

Latitude = +51.47700

感谢您的光临。

米克

Answer 1

您可以通过逐行遍历文件而不是使用.readlines（）将它们全部放在列表中来跟踪行。 类似以下内容可能会满足您的需求（它将找到所有问题/ gps对，请注意，如果没有以下gps对，它将不会发现问题）：

文件：

random
random
random
GPS INFO: 238939
random
BIG Problem Line
random
blah GPS INFO: 238490
random GPS INFO: 325236342
BIG Problem Line2
GPS INFO: 12343

码：

searchstringsProblem = 'BIG Problem Line'
searchstringsGPSLoc = 'GPS INFO'
matches = []

with open("test.txt") as f:
    problem = False
    problem_line = ""

    for line in f:
        if not problem and searchstringsProblem in line:
            problem_line = line.strip()
            problem = True
        elif problem and searchstringsGPSLoc in line:
            matches.append((problem_line, line.strip()))
            problem = False

print matches

这产生了我们：

[('BIG Problem Line', 'blah GPS INFO: 238490'), ('BIG Problem Line2', 'GPS INFO: 12343')]

如果要跟踪行号，可以使用enumerate在行中进行迭代，然后将其添加到增加的值中。 不确定您要如何存储所有匹配项，因此我只是假设了一个list [（problem，gps）]的情况。

编辑：每个注释的经度/纬度更新支持：

文件：

 random random random GPS INFO: 238939 LONGITUDE: 123 LATITUDE: 321 random BIG Problem Line random blah GPS INFO: 238490 LONGITUDE: 456 LATITUDE: 654 random GPS INFO: 325236342 LONGITUDE: 789 LATITUDE: 987 BIG Problem Line2 GPS INFO: 12343 LONGITUDE: 432 LATITUDE: 678

码：

 searchstringsProblem = 'BIG Problem Line' searchstringsGPSLoc = 'GPS INFO' matches = [] with open("test.txt") as f: problem = False problem_line = "" for line in f: if not problem and searchstringsProblem in line: problem_line = line.strip() problem = True elif problem and searchstringsGPSLoc in line: matches.append((problem_line, line.strip(), f.next().strip(), f.next().strip())) problem = False for item in matches: print item

输出：

 ('BIG Problem Line', 'blah GPS INFO: 238490', 'LONGITUDE: 456', 'LATITUDE: 654') ('BIG Problem Line2', 'GPS INFO: 12343', 'LONGITUDE: 432', 'LATITUDE: 678')

EDIT2：更新为在查找经度/纬度时忽略空行：

文件：

searchstringsProblem = 'BIG Problem Line'
searchstringsGPSLoc = 'GPS Info'
matches = []

with open("test.txt") as f:
    problem = False
    problem_line = ""

    for line in f:
        if not problem and searchstringsProblem in line:
            problem_line = line.strip()
            problem = True
        elif problem and searchstringsGPSLoc in line:
            latitude = ""
            longitude = ""
            for new_line in f:
                if "Longitude" in new_line:
                    longitude = new_line.split("=")[1].strip()
                elif "Latitude" in new_line:
                    latitude = new_line.split("=")[1].strip()
                if longitude and latitude:
                    break;

            if latitude and longitude:
                matches.append((problem_line, line.strip(), latitude, longitude))
                problem = False

for item in matches:
    print item

码：

('BIG Problem Line', 'Key line2 GPS Info', '+51.47700', '-0.00000')

输出：

 ('BIG Problem Line', 'Key line2 GPS Info', '+51.47700', '-0.00000')

Answer 2

编辑：增加了对获得经度和纬度的支持。 编辑2：正确拆分。

如果你想跟踪哪些问题数（即第1次发生问题，第二个时间等），您可以添加一些枚举或计数器它，或else对当前if要写一些文字时您会发现“ Big Problem Line 。

def get_text_file(path):
    with open(path, "r") as f:
        searchstrings = ['BIG Problem Line', 'GPS INFO']
        current_string = 0
        for line in f:
            if searchstrings[current_string] in line:
                # That is, if the current index is 1 (you're looking for GPS info)
                if(current_string):
                    long_line = f.next()
                    lat_line = f.next()
                    long_value = long_line.split('=')[1]
                    lat_value = lat_line.split('=')[1]
                    some_write_function(long_value, lat_value) 
                current_string ^= 1 # Flips the bit (0^(1)=1, 1^(1)=0)

使用Python运行嵌套字符串搜索

问题描述

2 个解决方案

解决方案1
2 已采纳 2014-07-11 13:35:09

解决方案2
0 2014-07-11 13:40:42

使用Python运行嵌套字符串搜索

问题描述

2 个解决方案

解决方案1 2 已采纳 2014-07-11 13:35:09

解决方案2 0 2014-07-11 13:40:42

解决方案1
2 已采纳 2014-07-11 13:35:09

解决方案2
0 2014-07-11 13:40:42