简体   繁体   中英

Python 3.6: How can I open a text file into two sets and check intersections

I am currently starting out with Python 3.6 and using the IDLE development area. I have been looking online for a solution to the following problem:

I have two text files. The first is my input list, the second is my blacklist. I want to check for any instances where a line in my input list is also in my blacklist. The end goal will be to create a new list that contains all the intersections.

I am currently doing the following:

input_list=set(line.strip() for line in open("input_list.txt",'r'))
black_list=set(line.strip() for line in open("black_list.txt",'r'))

print("Input List")
print(input_list)
print("Black List")
print(black_list)
print("Intersection")
print(input_list.intersection(black_list))

I will explain my reasoning so hopefully people can correct my logic with their answers, not just provide a solution.

input_list=set(line.strip() for line in open("input_list.txt",'r'))
black_list=set(line.strip() for line in open("black_list.txt",'r'))

With the above two lines of code I am opening two sets. Each one strips out all of the \\n values and leaves me with just the text from each line.

print("Input List")
print(input_list)
print("Black List")
print(black_list)

This section is simply for me to check my text files have been opened, and all the values are present in the set. There is a header above each section for clarity.

print("Intersection")
print(input_list.intersection(black_list))

In this piece of code I start with a header. I then try to print any intersection values that are found. Currently the result that I get in my shell looks like this:

Input List
{'value1', 'value2', 'value3'}
Black List
{'valueA', 'valueB', 'valueC'}
Intersection
set()

I got my information about intersection from the following link:

https://docs.python.org/3/tutorial/datastructures.html

I got my file opening into sets from this article:

Python: load words from file into a set

I have been reading The Python Manual, Volume 33 from the Black Dog i-Tech Series. I used this to learn the basics for Python. Whilst it covers basic open, reading, and writing - it does not cover more complex features.

I'm mostly designing this to create a tool for inputting domains, and checking against a blacklist of bad domains. This is to be used for SEO purposes and help me to quickly produce a disavow file. Aside from being practical for my work, this is also just a personal project to help me explore, learn, and develop my understanding of Python.

Indeed your code appears to accomplish your goal. The intersection between the set

{'value1', 'value2', 'value3'}

and

{'valueA', 'valueB', 'valueC'}

is indeed the empty set. Python represents an empty set with set() . If perhaps you were expecting {} to represent the empty set, this is actually an empty dictionary .

I should have been more clear, value1, value2, value3, valueA, valueB, valueC are just example values. The list that I am using to test is significantly longer and posting it here would have been inappropriate.

In retrospect, I've realised I made a huge mistake in not double checking that there were indeed any duplicate values within the two.

After people have said that the code is correct, I've now realized my problem was my own stupidity for not double checking there were in fact any duplicate values.

This is now resolved, I'm an idiot.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM