[英]Remove certain links from a textfile by reading textfile
所以我有包含一些链接的whitelist.txt ,以及包含其他链接的scrapedlist.txt ,以及whitelist.txt 中的链接。
我正在尝试打开并阅读whitelist.txt ,然后打开并阅读scrapedlist.txt - 写入一个新文件updatedlist2.txt ,该文件将包含scrapedlist.txt减去whitelist.txt 的所有内容。
我对 Python 还是很陌生,所以还在学习。 我已经搜索了答案,这就是我想出的:
def whitelist_file_func():
with open("whitelist.txt", "r") as whitelist_read:
whitelist_read.readlines()
whitelist_read.close()
unique2 = set()
with open("scrapedlist.txt", "r") as scrapedlist_read:
scrapedlist_lines = scrapedlist_read.readlines()
scrapedlist_read.close()
unique3 = set()
with open("updatedlist2.txt", "w") as whitelist_write2:
for line in scrapedlist_lines:
if unique2 not in line and line not in unique3:
whitelist_write2.write(line)
unique3.add(line)
我收到此错误,我也不确定我是否以正确的方式进行操作:
if unique2 not in line and line not in unique3:
TypeError: 'in <string>' requires string as left operand, not set
我应该怎么做才能实现上述目标,而且我的代码是否正确?
编辑:
白名单.txt:
KUWAIT
ISRAEL
FRANCE
刮除清单.txt:
USA
CANADA
GERMANY
KUWAIT
ISRAEL
FRANCE
updatedlist2.txt(应该是这样的):
USA
CANADA
GERMANY
根据您的描述,我对您的代码进行了一些更改。
readlines()
方法被替换为read().splitlines()
。 他们都读取整个文件并将每一行转换为一个列表项。 不同之处在于readlines()
在项目末尾包含\n
。unique2
和unique3
被删除。 我找不到他们的用法。whitelist_lines
和scrapedlist_lines
是两个包含链接的列表。 根据您的描述,我们需要不在whitelist_lines
列表中的scrapedlist_lines
行,因此条件if unique2 not in line and line not in unique3:
更改为if line not in whitelist_lines:
。whitelist_write2.close()
。最终代码是:
with open("whitelist.txt", "r") as whitelist_read:
whitelist_lines = whitelist_read.read().splitlines()
whitelist_read.close()
with open("scrapedlist.txt", "r") as scrapedlist_read:
scrapedlist_lines = scrapedlist_read.read().splitlines()
scrapedlist_read.close()
with open("updatedlist2.txt", "w") as whitelist_write2:
for line in scrapedlist_lines:
if line not in whitelist_lines:
whitelist_write2.write(line + "\n")
whitelist_write2.close()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.