简体   繁体   中英

How to search a text file table in Python?

I am creating a rainbow table with strings and hashes separated by spaces in a table. The rainbow table looks like this:

j)O 3be44b195706cdd25e29d2b01a0e88d4
j)P a83079350701398672677a9ffe07108c
j)Q 2952c4654c127f2bb1086b75d8f1f986
j)R 6621ec6e1ba3c3669259894db8cde339
j)S 0442a2ee045e1913cd2eb094e8945399

I want to know how I can make a python program to search for a string and find a hash or vice versa.

I have made it search the whole document, but I want it to only search a specific column.

I used panda and I can make it search now in a specific column but I want it only to find exact matchs:

working_table = pd.read_csv('rainbow_table_md5.txt', sep = ' ', names=["string", "hash"])
print(working_table['hash'].where(working_table['string'] == input(colored("String: ", 'cyan'))))

The code right now outputs this:

String: a
0           0cc175b9c0f1b6a831c399e269772661
1                                        NaN
2                                        NaN

                          ...               
14094701                                 NaN
14094702                                 NaN

Name: hash, Length: 14094731, dtype: object

I don't need all the other lines other than the match in row 0

Ideally I only need the hash as the output.

You want "lookup" rather than "search", since only an exact match matters. Pandas might be overkill for this application. A pair of dictionaries suffices:

class Rainbow:

    def __init__(self, infile, k=20):
        self.s_to_hash = {s: hash
                          for s, hash in self._read_tuples(infile)}
        self.hash_to_s = {hash[:k]: s
                          for s, hash in self.s_to_hash.items()}
        self.k = k

    @staticmethod
    def _read_tuples(infile):
        with open(infile) as fin:
            for line in fin:
                s, hash = line.strip().split()
                yield s, hash

Choosing k < 32 is an attempt to save some memory, at the (small) risk of having hashes collide based on their common prefix. Tune it up or down to taste, based on your memory, table size, and appetite for collision risk. Consider writing a getter function and then making hash_to_s private.

Storing bytes would be twice as memory efficient compared to storing ascii hex nybbles.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM