简体   繁体   English

Python pandas 检查行是否包含字符串

[英]Python pandas checking if row contains a string

I am trying to make a program that would sort found password hashes with CSV file containing hash and email.我正在尝试制作一个程序,该程序将使用包含 hash 和 email 的 CSV 文件对找到的密码哈希进行排序。 I am trying to get the "Email" from ex.csv and "Pass" from the found.txt where hash values coincide.我正在尝试从 ex.csv 中获取“电子邮件”,并从 found.txt 中获取“通过”,其中 hash 值一致。 But I am getting an error - raise ValueError( ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().但我收到一个错误 - raise ValueError( ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

My code -我的代码 -

import pandas as pd
import numpy as np

ex = pd.read_csv("ex.csv",delimiter=",")
found = pd.read_csv("found.txt",delimiter=":")

temp = ex[["Hash","Email"]]
te = found[["Hash","Pass"]]

for index,row in te.iterrows(): #Looping through file
    if temp.loc[temp['Hash'] == row['Hash'][index]]: # If pandas can't locate Hash string inside a first file, list is empty. And I am comparing that here
        print(temp['Email'][index]) # If successful, print out the
        print(te['Pass'][index])    # found values in the console

Samples from ex.csv:来自 ex.csv 的样品:

                                          Hash                    Email
0     210ac64b3c5a570e177b26bb8d1e3e93f72081fd  example@example.com
1     707a1b7b7d9a12112738bcef3acc22aa09e8c915  example@example.com
2     24529d87ea25b05daba92c2b7d219a470c3ff3a0  example@example.com

Samples from found.txt: found.txt 中的示例:

                                         Hash         Pass
0    f8fa3b3da3fc71e1eaf6c18e4afef626e1fc7fc1     pass1
1    ecdc5a7c21b2eb84dfe498657039a4296cbad3f4     pass2
2    f61946739c01cff69974093452057c90c3e0ba14     pass3

Or maybe there are better ways to iterate through rows and check if the row contains string from another file row?或者也许有更好的方法来遍历行并检查该行是否包含来自另一个文件行的字符串? ;) ;)

import pandas as pd
import numpy as np

ex = pd.read_csv("c.csv",delimiter=",")
found = pd.read_csv("d.csv",delimiter=",")

print(ex)
print(found)

temp = ex[['Hash','Email']]
te = found[['Hash','Pass']]

for temp1, temp2 in zip(te.iterrows(), temp.iterrows()):
    if temp2[1]['Hash'][temp2[0]] == temp1[1]['Hash'][temp1[0]]:
        print(temp['Email'][temp2[0]])
        print(te['Pass'][temp1[0]])

I have stored values like this我已经存储了这样的值

1) c.csv 1) c.csv

Hash,Email
210ac64b3c5a570e177b26bb8d1e3e93f72081fd,example@example.com
707a1b7b7d9a12112738bcef3acc22aa09e8c915,example@example.com
24529d87ea25b05daba92c2b7d219a470c3ff3a0,example@example.com

2) d.csv 2) d.csv

Hash,Pass
f8fa3b3da3fc71e1eaf6c18e4afef626e1fc7fc1,pass1
ecdc5a7c21b2eb84dfe498657039a4296cbad3f4,pass2
f61946739c01cff69974093452057c90c3e0ba14,pass3

To print matches, use the following code:要打印匹配项,请使用以下代码:

for _, row in te.iterrows():
    rowHash = row.Hash
    matches = temp.Hash == rowHash  # boolean mask
    if matches.any():
        mails = temp[matches].Email.tolist()
        print(f'Found:  {rowHash} / {row.Pass} / {", ".join(mails)}')

Thoroughly compare my code with yours.彻底比较我的代码和你的代码。 I think, such comparison will allow you to locate what was wrong in your code.我认为,这样的比较将使您能够找到代码中的错误。

You didn't write it precisely, but I suppose that your error occurred in if instruction (my version is different).你写的不是很准确,但我想你的错误发生在if指令中(我的版本不同)。

Edit编辑

You can also try another concept.你也可以尝试另一个概念。 Due to lookup by index it should run considerably faster than the above loop.由于按索引查找,它应该比上述循环运行得快得多。

# Set 'Hash' column as the index in both DataFrames
temp2 = temp.set_index('Hash')
te2 = te.set_index('Hash')
# Loop over rows in 'te2', index (Hash) in 'teHash'
for teHash, row in te2.iterrows():
    try:
        res = temp2.loc[teHash]  # Attempt to find corresponding row(s) in 'temp2'
        if isinstance(res, pd.Series):  # Single match found
            mails = res.Email
        else:                           # Multiple matches found
            mails = ', '.join(res.Email)
        print(f'Found: {teHash} / {row.Pass} / {mails}')
    except KeyError:
        pass      # Not found

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM