[英]Remove certain links from a textfile by reading textfile
So I have whitelist.txt which contains some links, and scrapedlist.txt which contains other links, and also links that are in whitelist.txt.所以我有包含一些链接的whitelist.txt ,以及包含其他链接的scrapedlist.txt ,以及whitelist.txt 中的链接。
I'm trying to open and read whitelist.txt and then open and read scrapedlist.txt - to write to a new file updatedlist2.txt which will have all the contents of scrapedlist.txt minus whitelist.txt.我正在尝试打开并阅读whitelist.txt ,然后打开并阅读scrapedlist.txt - 写入一个新文件updatedlist2.txt ,该文件将包含scrapedlist.txt减去whitelist.txt 的所有内容。
I'm pretty new to Python, so still learning.我对 Python 还是很陌生,所以还在学习。 I've searched for answers, and this is what I came up with:
我已经搜索了答案,这就是我想出的:
def whitelist_file_func():
with open("whitelist.txt", "r") as whitelist_read:
whitelist_read.readlines()
whitelist_read.close()
unique2 = set()
with open("scrapedlist.txt", "r") as scrapedlist_read:
scrapedlist_lines = scrapedlist_read.readlines()
scrapedlist_read.close()
unique3 = set()
with open("updatedlist2.txt", "w") as whitelist_write2:
for line in scrapedlist_lines:
if unique2 not in line and line not in unique3:
whitelist_write2.write(line)
unique3.add(line)
I get this error and I'm also not sure if I'm doing it the right way:我收到此错误,我也不确定我是否以正确的方式进行操作:
if unique2 not in line and line not in unique3:
TypeError: 'in <string>' requires string as left operand, not set
What should I do to achieve the above-mentioned and also is my code right?我应该怎么做才能实现上述目标,而且我的代码是否正确?
EDIT:编辑:
whitelist.txt:白名单.txt:
KUWAIT
ISRAEL
FRANCE
scrapedlist.txt:刮除清单.txt:
USA
CANADA
GERMANY
KUWAIT
ISRAEL
FRANCE
updatedlist2.txt (this is how it should be): updatedlist2.txt(应该是这样的):
USA
CANADA
GERMANY
Based on your description, I applied some changes to your code.根据您的描述,我对您的代码进行了一些更改。
readlines()
method is replaced with read().splitlines()
. readlines()
方法被替换为read().splitlines()
。 Both of them read the whole file and convert each line to a list item.readlines()
include \n
at the end of items.readlines()
在项目末尾包含\n
。unique2
and unique3
are removed. unique2
和unique3
被删除。 I couldn't find their usage.whitelist_lines
and scrapedlist_lines
are two lists that contain links.whitelist_lines
和scrapedlist_lines
是两个包含链接的列表。 Based on your description we need lines of scrapedlist_lines
that are not in the whitelist_lines
list so condition if unique2 not in line and line not in unique3:
changed to if line not in whitelist_lines:
.whitelist_lines
列表中的scrapedlist_lines
行,因此条件if unique2 not in line and line not in unique3:
更改为if line not in whitelist_lines:
。whitelist_write2.close()
is required after write to the file.whitelist_write2.close()
。 The final code is:最终代码是:
with open("whitelist.txt", "r") as whitelist_read:
whitelist_lines = whitelist_read.read().splitlines()
whitelist_read.close()
with open("scrapedlist.txt", "r") as scrapedlist_read:
scrapedlist_lines = scrapedlist_read.read().splitlines()
scrapedlist_read.close()
with open("updatedlist2.txt", "w") as whitelist_write2:
for line in scrapedlist_lines:
if line not in whitelist_lines:
whitelist_write2.write(line + "\n")
whitelist_write2.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.