[英]Finding strings in a large text file in Python
The following is my code: 以下是我的代码:
with open("WinUpdates.txt") as f:
data=[]
for elem in f:
data.append(elem)
with open("checked.txt", "w") as f:
check=True
for item in data:
if "KB2982791" in item:
f.write("KB2982791\n")
check=False
if "KB2970228" in item:
f.write("KB2970228\n")
check=False
if "KB2918614" in item:
f.write("KB2918614\n")
check=False
if "KB2993651" in item:
f.write("KB2993651\n")
check=False
if "KB2975719" in item:
f.write("KB2975719\n")
check=False
if "KB2975331" in item:
f.write("KB2975331\n")
check=False
if "KB2506212" in item:
f.write("KB2506212\n")
check=False
if "KB3004394" in item:
f.write("KB3004394\n")
check=False
if "KB3114409" in item:
f.write("KB3114409\n")
check=False
if "KB3114570" in item:
f.write("KB3114570\n")
check=False
if check:
f.write("No faulty Windows Updates found!")
The "WinUpdates.txt" file contains a lot of lines like these: “WinUpdates.txt”文件包含很多这样的行:
http://support.microsoft.com/?kbid=2980245 RECHTS Update
http://support.microsoft.com/?kbid=2980245 RECHTS更新
KB2980245 NT-AUTORITÄT\\SYSTEM 8/18/2014KB2980245 NT-AUTORITÄT\\ SYSTEM 8/18/2014
http://support.microsoft.com/?kbid=2981580 RECHTS Updatehttp://support.microsoft.com/?kbid=2981580 RECHTS更新
KB2981580 NT-AUTORITÄT\\SYSTEM 8/18/2014KB2981580 NT-AUTORITÄT\\ SYSTEM 8/18/2014
http://support.microsoft.com/?kbid=2982378 RECHTS Security Update KB2982378 NT-AUTORITÄT\\SYSTEM 9/12/2014http://support.microsoft.com/?kbid=2982378 RECHTS安全更新KB2982378 NT-AUTORITÄT\\ SYSTEM 9/12/2014
http://support.microsoft.com/?kbid=2984972 RECHTS Security Update KB2984972 NT-AUTORITÄT\\SYSTEM 10/17/2014http://support.microsoft.com/?kbid=2984972 RECHTS安全更新KB2984972 NT-AUTORITÄT\\ SYSTEM 10/17/2014
http://support.microsoft.com/?kbid=2984976 RECHTS Security Update KB2984976 NT-AUTORITÄT\\SYSTEM 10/17/2014http://support.microsoft.com/?kbid=2984976 RECHTS安全更新KB2984976 NT-AUTORITÄT\\ SYSTEM 10/17/2014
http://support.microsoft.com/?kbid=2984981 RECHTS Security Update KB2984981 NT-AUTORITÄT\\SYSTEM 10/16/2014http://support.microsoft.com/?kbid=2984981 RECHTS安全更新KB2984981 NT-AUTORITÄT\\ SYSTEM 10/16/2014
http://support.microsoft.com/?kbid=2985461 RECHTS Updatehttp://support.microsoft.com/?kbid=2985461 RECHTS更新
KB2985461 NT-AUTORITÄT\\SYSTEM 9/12/2014KB2985461 NT-AUTORITÄT\\ SYSTEM 9/12/2014
http://support.microsoft.com/?kbid=2987107 RECHTS Security Update KB2987107 NT-AUTORITÄT\\SYSTEM 10/17/2014http://support.microsoft.com/?kbid=2987107 RECHTS安全更新KB2987107 NT-AUTORITÄT\\ SYSTEM 10/17/2014
http://support.microsoft.com/?kbid=2990214 RECHTS Updatehttp://support.microsoft.com/?kbid=2990214 RECHTS更新
KB2990214 NT-AUTORITÄT\\SYSTEM 4/16/2015KB2990214 NT-AUTORITÄT\\ SYSTEM 4/16/2015
http://support.microsoft.com/?kbid=2991963 RECHTS Security Update KB2991963 NT-AUTORITÄT\\SYSTEM 11/14/2014http://support.microsoft.com/?kbid=2991963 RECHTS安全更新KB2991963 NT-AUTORITÄT\\ SYSTEM 11/14/2014
http://support.microsoft.com/?kbid=2992611 RECHTS Security Update KB2992611 NT-AUTORITÄT\\SYSTEM 11/14/2014http://support.microsoft.com/?kbid=2992611 RECHTS安全更新KB2992611 NT-AUTORITÄT\\ SYSTEM 11/14/2014
http://support.microsoft.com/?kbid=2993651 RECHTS Updatehttp://support.microsoft.com/?kbid=2993651 RECHTS更新
KB2993651 NT-AUTORITÄT\\SYSTEM 8/29/2014KB2993651 NT-AUTORITÄT\\ SYSTEM 8/29/2014
http://support.microsoft.com/?kbid=2993958 RECHTS Security Update KB2993958 NT-AUTORITÄT\\SYSTEM 11/14/2014http://support.microsoft.com/?kbid=2993958 RECHTS安全更新KB2993958 NT-AUTORITÄT\\ SYSTEM 11/14/2014
But when I execute my code, it says that it has not found any of those updates? 但是当我执行我的代码时,它说它没有找到任何这些更新? Even though I know that it should find 4. I wrote the "data" list into a new text file, but there it seems everything alright?
即使我知道它应该找到4.我将“数据”列表写入一个新的文本文件,但似乎一切都好吗?
Why do you think my code does not work? 为什么你认为我的代码不起作用?
FWIW, your code can be written in a more compact way that doesn't require a zillion if
statements. FWIW,你可以编写代码在不需要数不胜数一个更紧凑的方式
if
语句。 Also, since the (new) data file is only 63342 bytes you can read the whole thing into a single string, rather than into a list of strings. 此外,由于(新)数据文件只有63342字节,您可以将整个内容读入单个字符串,而不是字符串列表。
kb_ids = (
"KB2982791",
"KB2970228",
"KB2918614",
"KB2993651",
"KB2975719",
"KB2975331",
"KB2506212",
"KB3004394",
"KB3114409",
"KB3114570",
)
with open("WinUpdates.txt") as f:
data = f.read()
check = True
with open("checked.txt", "w") as f:
for kb in kb_ids:
if kb in data:
f.write(kb + "\n")
check = False
if check:
fout.write("No faulty Windows Updates found!\n")
Contents of checked.txt , using the linked data: checked.txt的内容,使用链接数据:
KB2970228
KB2918614
KB2993651
KB2506212
KB3004394
Note that this code prints the found kbids in the order that they're defined in kb_ids
, rather than the order they occur in "WinUpdates.txt". 请注意,此代码按照它们在
kb_ids
定义的顺序打印找到的kbids,而不是它们在“WinUpdates.txt”中出现的顺序。
Searching through the whole file as a string for each kbid is probably not a good idea if the file is large, eg, more than a megabyte or so; 如果文件很大(例如,超过一兆字节左右),则搜索整个文件作为每个kbid的字符串可能不是一个好主意; you might want to run some timing tests (using
timeit
) to see which strategy works best on your data. 您可能希望运行一些计时测试(使用
timeit
)来查看哪种策略最适合您的数据。
If you want to read a file into a list there's no need to use a for
loop, you can just do this: 如果要将文件读入列表,则无需使用
for
循环,您可以这样做:
with open("WinUpdates.txt") as f:
data = f.readlines()
Alternatively, you can process the file line by line without reading it into a list: 或者,您可以逐行处理文件,而无需将其读入列表:
kb_ids = (
"KB2982791",
"KB2970228",
"KB2918614",
"KB2993651",
"KB2975719",
"KB2975331",
"KB2506212",
"KB3004394",
"KB3114409",
"KB3114570",
)
check = True
with open("WinUpdates.txt") as fin:
with open("checked.txt", "w") as fout:
for data in fin:
for kb in kb_ids:
if kb in data:
fout.write(kb + "\n")
check = False
if check:
fout.write("No faulty Windows Updates found!\n")
On more modern versions of Python the two with
statements can be combined into a single line. 在更现代的Python版本中,两个
with
语句可以组合成一行。
I added and fixed what you were missing check the two comments to see what I mean. 我添加并修复了您遗失的内容,请查看两条评论,看看我的意思。 This worked for me so it should work for you.
这对我有用,所以它应该适合你。 Have a great day!
祝你有美好的一天!
with open("WinUpdates.txt", "r") as f: #you forgot to put the "r" option to read the file
data = f.read() #no reason to put the data into a list a string will do fine
with open("checked.txt", "w") as f:
check=True
if "KB2982791" in data:
f.write("KB2982791\n")
check=False
if "KB2970228" in data:
f.write("KB2970228\n")
check=False
if "KB2918614" in data:
f.write("KB2918614\n")
check=False
if "KB2993651" in data:
f.write("KB2993651\n")
check=False
if "KB2975719" in data:
f.write("KB2975719\n")
check=False
if "KB2975331" in data:
f.write("KB2975331\n")
check=False
if "KB2506212" in data:
f.write("KB2506212\n")
check=False
if "KB3004394" in data:
f.write("KB3004394\n")
check=False
if "KB3114409" in data:
f.write("KB3114409\n")
check=False
if "KB3114570" in data:
f.write("KB3114570\n")
check=False
if check:
f.write("No faulty Windows Updates found!")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.