简体   繁体   English

通过读取文本文件从文本文件中删除某些链接

[英]Remove certain links from a textfile by reading textfile

So I have whitelist.txt which contains some links, and scrapedlist.txt which contains other links, and also links that are in whitelist.txt.所以我有包含一些链接的whitelist.txt ,以及包含其他链接的scrapedlist.txt ,以及whitelist.txt 中的链接

I'm trying to open and read whitelist.txt and then open and read scrapedlist.txt - to write to a new file updatedlist2.txt which will have all the contents of scrapedlist.txt minus whitelist.txt.我正在尝试打开并阅读whitelist.txt ,然后打开并阅读scrapedlist.txt - 写入一个新文件updatedlist2.txt ,该文件将包含scrapedlist.txt减去whitelist.txt 的所有内容。

I'm pretty new to Python, so still learning.我对 Python 还是很陌生,所以还在学习。 I've searched for answers, and this is what I came up with:我已经搜索了答案,这就是我想出的:

def whitelist_file_func():
    with open("whitelist.txt", "r") as whitelist_read:
        whitelist_read.readlines()
    whitelist_read.close()

    unique2 = set()

    with open("scrapedlist.txt", "r") as scrapedlist_read:
        scrapedlist_lines = scrapedlist_read.readlines()
    scrapedlist_read.close()

    unique3 = set()

    with open("updatedlist2.txt", "w") as whitelist_write2:
   
        for line in scrapedlist_lines:
            if unique2 not in line and line not in unique3:
                whitelist_write2.write(line)
                unique3.add(line)

I get this error and I'm also not sure if I'm doing it the right way:我收到此错误,我也不确定我是否以正确的方式进行操作:

if unique2 not in line and line not in unique3:
TypeError: 'in <string>' requires string as left operand, not set

What should I do to achieve the above-mentioned and also is my code right?我应该怎么做才能实现上述目标,而且我的代码是否正确?

EDIT:编辑:

whitelist.txt:白名单.txt:

KUWAIT
ISRAEL
FRANCE

scrapedlist.txt:刮除清单.txt:

USA
CANADA
GERMANY
KUWAIT
ISRAEL
FRANCE

updatedlist2.txt (this is how it should be): updatedlist2.txt(应该是这样的):

USA
CANADA
GERMANY

Based on your description, I applied some changes to your code.根据您的描述,我对您的代码进行了一些更改。

  1. readlines() method is replaced with read().splitlines() . readlines()方法被替换为read().splitlines() Both of them read the whole file and convert each line to a list item.他们都读取整个文件并将每一行转换为一个列表项。 The difference is readlines() include \n at the end of items.不同之处在于readlines()在项目末尾包含\n
  2. unique2 and unique3 are removed. unique2unique3被删除。 I couldn't find their usage.我找不到他们的用法。
  3. By two first parts whitelist_lines and scrapedlist_lines are two lists that contain links.通过前两个部分whitelist_linesscrapedlist_lines是两个包含链接的列表。 Based on your description we need lines of scrapedlist_lines that are not in the whitelist_lines list so condition if unique2 not in line and line not in unique3: changed to if line not in whitelist_lines: .根据您的描述,我们需要不在whitelist_lines列表中的scrapedlist_lines行,因此条件if unique2 not in line and line not in unique3:更改为if line not in whitelist_lines:
  4. whitelist_write2.close() is required after write to the file.写入文件后需要whitelist_write2.close()

The final code is:最终代码是:

with open("whitelist.txt", "r") as whitelist_read:
    whitelist_lines = whitelist_read.read().splitlines()
    whitelist_read.close()

with open("scrapedlist.txt", "r") as scrapedlist_read:
    scrapedlist_lines = scrapedlist_read.read().splitlines()
    scrapedlist_read.close()

with open("updatedlist2.txt", "w") as whitelist_write2:
    for line in scrapedlist_lines:
        if line not in whitelist_lines:
            whitelist_write2.write(line + "\n")
    whitelist_write2.close()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM