简体   繁体   中英

How to delete all strings in file2 if exist in file1 with Python?

I am building a lookup table of decrypted hashes. As it takes time i sometimes have to stop doing this so i want to delete from file decrypted already hashes. My problem is... How to do this? My problem i described below code.

So far i managed to write this code:

'''
=============== FILES DESCRIPTION: ===============
"file_with_hashes_found" contains hashes and decrypted passwords:

441371ab04ce22aa7b0b1c250442e30c:saxon
2208639860dda3f5c6bf627bbe3657c7:saran
97ad856de10a64018f15e8e325ab1d0d:sonne
8cb4f88ffd80dac9c59859dcea8e2ae4:merde
65a3798f40db95c4ab443f3e2b7f4994:socha
ab1991b4286f7e79720fe0d4011789c8:spade
a6f600d2f555aab92b47b313e8766fe9:sawer
a1b01e734b573fca08eb1a65e6df9a38:style
9de839cbf1d8428014a70fc4e5a62154:sauer
3ddaeb82fbba964fb3461d4e4f1342eb:smile
02b73734f0da6d19e063841c5098cf7a:sugus
d59b909e85ac96bab7d68c281a97d222:sahon
dc71ab14bc5b8f02ed878df90dd7e7af:suppe

and much more. File is about 14mb in size

"file_with_hashes_to_find" contains only hashes, without decrypted passwords:

bf21a8c9bdc8accb02b208ccaa9b52da
4a30eca8171819a68a3ba4377c1b5a99
ac81c05b9a4da23f062e29fe2f5847a6
5f4dcc3b5aa765d61d8327deb882cf99
a0e7b2a565119c0a7ec3126a16016113
6b3dbad99021ee09d3671128d251bf30
d44cebb9ee0b292283d931c2fc038f2c
87faadbcb6348c11c3262c756e5080e4
0517268dcb2115e4c4126b31bd3f565b
f77dc2c6bd38fc5c7a7e15484ae183d5
36cdf8b887a5cffc78dcd5c08991b993
d7afe0276a6d0096cc4bb28b37064f7b
788d26c7cb3aa98cf8fb0ca8ab390b75
faeafa84672fd43e94043981e520db25
c9edbcdae9fa95f0bf05e689373fef43
484a80fdd3e6dde787979b8846d59a49
41a94251d25f42bcd2429c267aa9ce74
07771f73fd28cf3c72a88e5bafb9eaa0
b665814e724b8ac7d72468dc8fc7dea7
9b79ab85f62eeeab4ce9a30f3cf49b8d
65a3798f40db95c4ab443f3e2b7f4994
ab1991b4286f7e79720fe0d4011789c8
a6f600d2f555aab92b47b313e8766fe9
a1b01e734b573fca08eb1a65e6df9a38
9de839cbf1d8428014a70fc4e5a62154
3ddaeb82fbba964fb3461d4e4f1342eb
02b73734f0da6d19e063841c5098cf7a
d59b909e85ac96bab7d68c281a97d222
dc71ab14bc5b8f02ed878df90dd7e7af
02b73734f0da6d19e063841c5098cf7a
d59b909e85ac96bab7d68c281a97d222
dc71ab14bc5b8f02ed878df90dd7e7af

and much more, more hashes we have than hashes decrypted.
File is about 21mb in size because there are still hashes to decrypt.

'''


file_with_hashes_found = "hashcat_test.potfile"
file_with_hashes_to_find = "hashes_test.txt"
hashes_to_delete_list = [] # here we will store all decrypted hashes, but without passwords decrypted
hashes_count = 0

print(f"DEV MESSAGE: Hello. Opening {file_with_hashes_found}")  # hello message

# ==================== OPENING FILE WITH DECRYPTED HASHES AND STRIPPING PLAIN PASSWORD FROM A STRING

with open(file_with_hashes_found, encoding = "latin1") as file_to_open, open(file_with_hashes_to_find, "w+") as file_without_plains:
    print("DEV MESSAGE: Ok. File " + file_with_hashes_found + " is open.")
    decrypted_hash = [line.split()[0] for line in file_to_open]

    how_many_hashes_in_list = len(decrypted_hash)  # get how many elements are in list "hashes"
    print(f'DEV MESSAGE: how_many_hashes_in_list value is {how_many_hashes_in_list}')

    # check every item in list
    for index in range(0, how_many_hashes_in_list):
        current_hash_list = str(decrypted_hash[index])  # here we store current hash to check (including decrypted plain password) === result: a6f600d2f555aab92b47b313e8766fe9:sawer
        current_hash_string = str(current_hash_list).split(":",1)[0] # removing decrypted plain password and a colon, === result: a6f600d2f555aab92b47b313e8766fe9
        hashes_to_delete_list.append(current_hash_string) # here is the list of decrypted hashes but without plain password

    print(f'DEV MESSAGE: Added {len(hashes_to_delete_list)} hashes to delete')

# now i wanna to check if there is a decrypted hash (stored in hashes_to_delete_list) in file file_with_hashes_to_find
# if so, delete this hash (whole line as there is a one hash in one line)
    for line in file_with_hashes_found:
        for word in hashes_to_delete_list:
            #print(word)
            if (word == hashes_to_delete_list):
                line = line.replace(word,"")
                hashes_count += 1
                file_without_plains.write(line)


file_to_open.close()
file_without_plains.close()
print(f'DEV MESSAGE: Deleted {hashes_count} hashes from file.')

Result of this code is:

DEV MESSAGE: Hello. Opening hashcat_test.potfile

DEV MESSAGE: Ok. File hashcat_test.potfile is open.

DEV MESSAGE: how_many_hashes_in_list value is 347389

DEV MESSAGE: Added 347389 hashes to delete

DEV MESSAGE: Deleted 0 hashes from file.

file_with_hashes_to_find is completely cleaned (0 bytes)

Thanks for all answers.

Mode w+ truncates the file as noted in the open() documentation. It's mode r+ that allows rewriting of an existing file without truncation.

Another issue with using a single file object to both read and write is that there will be one file pointer so seek() s will be required. It's possible, but the easier alternative is to use two file objects ( a.txt is the main file and b.txt is what to remove).

with open('b.txt') as infile:
    seen_set = set(x.strip() for x in infile)

with open('a.txt', 'r') as reader, open('a.txt', 'r+') as writer:
    writer.seek(0)
    for line in reader:
        line = line.strip()
        if line not in seen_set:
            writer.write(line + '\n')
    writer.truncate()

(Note that overwriting a file incrementally has the inherent weakness that a crash or interrupt in the middle of the process could leave the file in an inconsistent state. Alternatives are to handle the subtraction in-memory and then fully overwrite the file, or write to a temp file and move it over the original.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM