I am trying to make a program that would sort found password hashes with CSV file containing hash and email. I am trying to get the "Email" from ex.csv and "Pass" from the found.txt where hash values coincide. But I am getting an error - raise ValueError( ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
My code -
import pandas as pd
import numpy as np
ex = pd.read_csv("ex.csv",delimiter=",")
found = pd.read_csv("found.txt",delimiter=":")
temp = ex[["Hash","Email"]]
te = found[["Hash","Pass"]]
for index,row in te.iterrows(): #Looping through file
if temp.loc[temp['Hash'] == row['Hash'][index]]: # If pandas can't locate Hash string inside a first file, list is empty. And I am comparing that here
print(temp['Email'][index]) # If successful, print out the
print(te['Pass'][index]) # found values in the console
Samples from ex.csv:
Hash Email
0 210ac64b3c5a570e177b26bb8d1e3e93f72081fd example@example.com
1 707a1b7b7d9a12112738bcef3acc22aa09e8c915 example@example.com
2 24529d87ea25b05daba92c2b7d219a470c3ff3a0 example@example.com
Samples from found.txt:
Hash Pass
0 f8fa3b3da3fc71e1eaf6c18e4afef626e1fc7fc1 pass1
1 ecdc5a7c21b2eb84dfe498657039a4296cbad3f4 pass2
2 f61946739c01cff69974093452057c90c3e0ba14 pass3
Or maybe there are better ways to iterate through rows and check if the row contains string from another file row? ;)
import pandas as pd
import numpy as np
ex = pd.read_csv("c.csv",delimiter=",")
found = pd.read_csv("d.csv",delimiter=",")
print(ex)
print(found)
temp = ex[['Hash','Email']]
te = found[['Hash','Pass']]
for temp1, temp2 in zip(te.iterrows(), temp.iterrows()):
if temp2[1]['Hash'][temp2[0]] == temp1[1]['Hash'][temp1[0]]:
print(temp['Email'][temp2[0]])
print(te['Pass'][temp1[0]])
I have stored values like this
1) c.csv
Hash,Email
210ac64b3c5a570e177b26bb8d1e3e93f72081fd,example@example.com
707a1b7b7d9a12112738bcef3acc22aa09e8c915,example@example.com
24529d87ea25b05daba92c2b7d219a470c3ff3a0,example@example.com
2) d.csv
Hash,Pass
f8fa3b3da3fc71e1eaf6c18e4afef626e1fc7fc1,pass1
ecdc5a7c21b2eb84dfe498657039a4296cbad3f4,pass2
f61946739c01cff69974093452057c90c3e0ba14,pass3
To print matches, use the following code:
for _, row in te.iterrows():
rowHash = row.Hash
matches = temp.Hash == rowHash # boolean mask
if matches.any():
mails = temp[matches].Email.tolist()
print(f'Found: {rowHash} / {row.Pass} / {", ".join(mails)}')
Thoroughly compare my code with yours. I think, such comparison will allow you to locate what was wrong in your code.
You didn't write it precisely, but I suppose that your error occurred in if
instruction (my version is different).
You can also try another concept. Due to lookup by index it should run considerably faster than the above loop.
# Set 'Hash' column as the index in both DataFrames
temp2 = temp.set_index('Hash')
te2 = te.set_index('Hash')
# Loop over rows in 'te2', index (Hash) in 'teHash'
for teHash, row in te2.iterrows():
try:
res = temp2.loc[teHash] # Attempt to find corresponding row(s) in 'temp2'
if isinstance(res, pd.Series): # Single match found
mails = res.Email
else: # Multiple matches found
mails = ', '.join(res.Email)
print(f'Found: {teHash} / {row.Pass} / {mails}')
except KeyError:
pass # Not found
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.