简体   繁体   English

在Python中查找大型文本文件中的字符串

[英]Finding strings in a large text file in Python

The following is my code: 以下是我的代码:

with open("WinUpdates.txt") as f:
    data=[]
    for elem in f:
        data.append(elem)

with open("checked.txt", "w") as f:
    check=True
    for item in data:
        if "KB2982791" in item:
            f.write("KB2982791\n")
            check=False
        if "KB2970228" in item:
            f.write("KB2970228\n")
            check=False
        if "KB2918614" in item:
            f.write("KB2918614\n")
            check=False
        if "KB2993651" in item:
            f.write("KB2993651\n")
            check=False
        if "KB2975719" in item:
            f.write("KB2975719\n")
            check=False
        if "KB2975331" in item:
            f.write("KB2975331\n")
            check=False
        if "KB2506212" in item:
            f.write("KB2506212\n")
            check=False
        if "KB3004394" in item:
            f.write("KB3004394\n")
            check=False
        if "KB3114409" in item:
            f.write("KB3114409\n")
            check=False
        if "KB3114570" in item:
            f.write("KB3114570\n")
            check=False

    if check:
        f.write("No faulty Windows Updates found!")

The "WinUpdates.txt" file contains a lot of lines like these: “WinUpdates.txt”文件包含很多这样的行:

http://support.microsoft.com/?kbid=2980245 RECHTS Update http://support.microsoft.com/?kbid=2980245 RECHTS更新
KB2980245 NT-AUTORITÄT\\SYSTEM 8/18/2014 KB2980245 NT-AUTORITÄT\\ SYSTEM 8/18/2014
http://support.microsoft.com/?kbid=2981580 RECHTS Update http://support.microsoft.com/?kbid=2981580 RECHTS更新
KB2981580 NT-AUTORITÄT\\SYSTEM 8/18/2014 KB2981580 NT-AUTORITÄT\\ SYSTEM 8/18/2014
http://support.microsoft.com/?kbid=2982378 RECHTS Security Update KB2982378 NT-AUTORITÄT\\SYSTEM 9/12/2014 http://support.microsoft.com/?kbid=2982378 RECHTS安全更新KB2982378 NT-AUTORITÄT\\ SYSTEM 9/12/2014
http://support.microsoft.com/?kbid=2984972 RECHTS Security Update KB2984972 NT-AUTORITÄT\\SYSTEM 10/17/2014 http://support.microsoft.com/?kbid=2984972 RECHTS安全更新KB2984972 NT-AUTORITÄT\\ SYSTEM 10/17/2014
http://support.microsoft.com/?kbid=2984976 RECHTS Security Update KB2984976 NT-AUTORITÄT\\SYSTEM 10/17/2014 http://support.microsoft.com/?kbid=2984976 RECHTS安全更新KB2984976 NT-AUTORITÄT\\ SYSTEM 10/17/2014
http://support.microsoft.com/?kbid=2984981 RECHTS Security Update KB2984981 NT-AUTORITÄT\\SYSTEM 10/16/2014 http://support.microsoft.com/?kbid=2984981 RECHTS安全更新KB2984981 NT-AUTORITÄT\\ SYSTEM 10/16/2014
http://support.microsoft.com/?kbid=2985461 RECHTS Update http://support.microsoft.com/?kbid=2985461 RECHTS更新
KB2985461 NT-AUTORITÄT\\SYSTEM 9/12/2014 KB2985461 NT-AUTORITÄT\\ SYSTEM 9/12/2014
http://support.microsoft.com/?kbid=2987107 RECHTS Security Update KB2987107 NT-AUTORITÄT\\SYSTEM 10/17/2014 http://support.microsoft.com/?kbid=2987107 RECHTS安全更新KB2987107 NT-AUTORITÄT\\ SYSTEM 10/17/2014
http://support.microsoft.com/?kbid=2990214 RECHTS Update http://support.microsoft.com/?kbid=2990214 RECHTS更新
KB2990214 NT-AUTORITÄT\\SYSTEM 4/16/2015 KB2990214 NT-AUTORITÄT\\ SYSTEM 4/16/2015
http://support.microsoft.com/?kbid=2991963 RECHTS Security Update KB2991963 NT-AUTORITÄT\\SYSTEM 11/14/2014 http://support.microsoft.com/?kbid=2991963 RECHTS安全更新KB2991963 NT-AUTORITÄT\\ SYSTEM 11/14/2014
http://support.microsoft.com/?kbid=2992611 RECHTS Security Update KB2992611 NT-AUTORITÄT\\SYSTEM 11/14/2014 http://support.microsoft.com/?kbid=2992611 RECHTS安全更新KB2992611 NT-AUTORITÄT\\ SYSTEM 11/14/2014
http://support.microsoft.com/?kbid=2993651 RECHTS Update http://support.microsoft.com/?kbid=2993651 RECHTS更新
KB2993651 NT-AUTORITÄT\\SYSTEM 8/29/2014 KB2993651 NT-AUTORITÄT\\ SYSTEM 8/29/2014
http://support.microsoft.com/?kbid=2993958 RECHTS Security Update KB2993958 NT-AUTORITÄT\\SYSTEM 11/14/2014 http://support.microsoft.com/?kbid=2993958 RECHTS安全更新KB2993958 NT-AUTORITÄT\\ SYSTEM 11/14/2014

But when I execute my code, it says that it has not found any of those updates? 但是当我执行我的代码时,它说它没有找到任何这些更新? Even though I know that it should find 4. I wrote the "data" list into a new text file, but there it seems everything alright? 即使我知道它应该找到4.我将“数据”列表写入一个新的文本文件,但似乎一切都好吗?

Why do you think my code does not work? 为什么你认为我的代码不起作用?

FWIW, your code can be written in a more compact way that doesn't require a zillion if statements. FWIW,你可以编写代码在不需要数不胜数一个更紧凑的方式if语句。 Also, since the (new) data file is only 63342 bytes you can read the whole thing into a single string, rather than into a list of strings. 此外,由于(新)数据文件只有63342字节,您可以将整个内容读入单个字符串,而不是字符串列表。

kb_ids = (
    "KB2982791",
    "KB2970228",
    "KB2918614",
    "KB2993651",
    "KB2975719",
    "KB2975331",
    "KB2506212",
    "KB3004394",
    "KB3114409",
    "KB3114570",
)

with open("WinUpdates.txt") as f:
    data = f.read()

check = True
with open("checked.txt", "w") as f:
    for kb in kb_ids:
        if kb in data:
            f.write(kb + "\n")
            check = False

    if check:
        fout.write("No faulty Windows Updates found!\n")

Contents of checked.txt , using the linked data: checked.txt的内容,使用链接数据:

KB2970228
KB2918614
KB2993651
KB2506212
KB3004394

Note that this code prints the found kbids in the order that they're defined in kb_ids , rather than the order they occur in "WinUpdates.txt". 请注意,此代码按照它们在kb_ids定义的顺序打印找到的kbids,而不是它们在“WinUpdates.txt”中出现的顺序。

Searching through the whole file as a string for each kbid is probably not a good idea if the file is large, eg, more than a megabyte or so; 如果文件很大(例如,超过一兆字节左右),则搜索整个文件作为每个kbid的字符串可能不是一个好主意; you might want to run some timing tests (using timeit ) to see which strategy works best on your data. 您可能希望运行一些计时测试(使用timeit )来查看哪种策略最适合您的数据。

If you want to read a file into a list there's no need to use a for loop, you can just do this: 如果要将文件读入列表,则无需使用for循环,您可以这样做:

with open("WinUpdates.txt") as f:
    data = f.readlines()

Alternatively, you can process the file line by line without reading it into a list: 或者,您可以逐行处理文件,而无需将其读入列表:

kb_ids = (
    "KB2982791",
    "KB2970228",
    "KB2918614",
    "KB2993651",
    "KB2975719",
    "KB2975331",
    "KB2506212",
    "KB3004394",
    "KB3114409",
    "KB3114570",
)

check = True
with open("WinUpdates.txt") as fin:
    with open("checked.txt", "w") as fout:
        for data in fin:
            for kb in kb_ids:
                if kb in data:
                    fout.write(kb + "\n")
                    check = False

        if check:
            fout.write("No faulty Windows Updates found!\n")

On more modern versions of Python the two with statements can be combined into a single line. 在更现代的Python版本中,两个with语句可以组合成一行。

I added and fixed what you were missing check the two comments to see what I mean. 我添加并修复了您遗失的内容,请查看两条评论,看看我的意思。 This worked for me so it should work for you. 这对我有用,所以它应该适合你。 Have a great day! 祝你有美好的一天!

with open("WinUpdates.txt", "r") as f:  #you forgot to put the "r" option to read the file
    data = f.read()  #no reason to put the data into a list a string will do fine

with open("checked.txt", "w") as f:
    check=True
    if "KB2982791" in data:
        f.write("KB2982791\n")
        check=False
    if "KB2970228" in data:
        f.write("KB2970228\n")
        check=False
    if "KB2918614" in data:
        f.write("KB2918614\n")
        check=False
    if "KB2993651" in data:
        f.write("KB2993651\n")
        check=False
    if "KB2975719" in data:
        f.write("KB2975719\n")
        check=False
    if "KB2975331" in data:
        f.write("KB2975331\n")
        check=False
    if "KB2506212" in data:
        f.write("KB2506212\n")
        check=False
    if "KB3004394" in data:
        f.write("KB3004394\n")
        check=False
    if "KB3114409" in data:
        f.write("KB3114409\n")
        check=False
    if "KB3114570" in data:
        f.write("KB3114570\n")
        check=False

    if check:
        f.write("No faulty Windows Updates found!")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM