[英]Remove certain links from a textfile by reading textfile
所以我有包含一些鏈接的whitelist.txt ,以及包含其他鏈接的scrapedlist.txt ,以及whitelist.txt 中的鏈接。
我正在嘗試打開並閱讀whitelist.txt ,然后打開並閱讀scrapedlist.txt - 寫入一個新文件updatedlist2.txt ,該文件將包含scrapedlist.txt減去whitelist.txt 的所有內容。
我對 Python 還是很陌生,所以還在學習。 我已經搜索了答案,這就是我想出的:
def whitelist_file_func():
with open("whitelist.txt", "r") as whitelist_read:
whitelist_read.readlines()
whitelist_read.close()
unique2 = set()
with open("scrapedlist.txt", "r") as scrapedlist_read:
scrapedlist_lines = scrapedlist_read.readlines()
scrapedlist_read.close()
unique3 = set()
with open("updatedlist2.txt", "w") as whitelist_write2:
for line in scrapedlist_lines:
if unique2 not in line and line not in unique3:
whitelist_write2.write(line)
unique3.add(line)
我收到此錯誤,我也不確定我是否以正確的方式進行操作:
if unique2 not in line and line not in unique3:
TypeError: 'in <string>' requires string as left operand, not set
我應該怎么做才能實現上述目標,而且我的代碼是否正確?
編輯:
白名單.txt:
KUWAIT
ISRAEL
FRANCE
刮除清單.txt:
USA
CANADA
GERMANY
KUWAIT
ISRAEL
FRANCE
updatedlist2.txt(應該是這樣的):
USA
CANADA
GERMANY
根據您的描述,我對您的代碼進行了一些更改。
readlines()
方法被替換為read().splitlines()
。 他們都讀取整個文件並將每一行轉換為一個列表項。 不同之處在於readlines()
在項目末尾包含\n
。unique2
和unique3
被刪除。 我找不到他們的用法。whitelist_lines
和scrapedlist_lines
是兩個包含鏈接的列表。 根據您的描述,我們需要不在whitelist_lines
列表中的scrapedlist_lines
行,因此條件if unique2 not in line and line not in unique3:
更改為if line not in whitelist_lines:
。whitelist_write2.close()
。最終代碼是:
with open("whitelist.txt", "r") as whitelist_read:
whitelist_lines = whitelist_read.read().splitlines()
whitelist_read.close()
with open("scrapedlist.txt", "r") as scrapedlist_read:
scrapedlist_lines = scrapedlist_read.read().splitlines()
scrapedlist_read.close()
with open("updatedlist2.txt", "w") as whitelist_write2:
for line in scrapedlist_lines:
if line not in whitelist_lines:
whitelist_write2.write(line + "\n")
whitelist_write2.close()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.