[英]How to compare data from 2 different dataframes
我正在嘗試在Python中比較來自2個pandas數據幀的數據。 我得到了在他們兩個中都很常見的一列,但是他們有不同的名字。 在第一個中,列的名稱為“文件”,在第二個中,列的名稱為“Códigodatransação”。 無論如何,我創建了此函數以比較數據,但是在這些行中卻出現了錯誤...為什么會發生這種情況?
def checar_valor(a,b):
for i in range(len(a)):
if b.isin([a['File'][i]]): #ERROR
print("O valor %s está presente nos dois dataframes" % a['File'][i])
else:
print("O valor %s está presente apenas no dataframe %s" % (a['File'][i], "a"))
for q in range(len(b)):
if a.isin([b['Código da transação'][q]]): #ERROR
print("O valor %s está presente nos dois dataframes" % b['Código da transação'][q])
else:
print("O valor %s está presente apenas no dataframe %s" % (b['Código da transação'][q], "b"))
Traceback (most recent call last):
File "C:/Users/nick/PycharmProjects/WebCrawler/Extranet/testezin.py", line 75, in <module>
checar_valor(rs, ga)
File "C:/Users/nick/PycharmProjects/WebCrawler/Extranet/testezin.py", line 64, in checar_valor
if b.isin([a['File'][i]]): #ERRO
File "C:\Users\nick\PycharmProjects\WebCrawler\venv\lib\site-packages\pandas\core\generic.py", line 1576, in __nonzero__
.format(self.__class__.__name__))
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
使用pd.DataFrame.where
可以獲取一個DataFrame,其中包含兩個DataFrame中和同一位置的值
df1.where(df1.values==df2.values)
編輯:按照您的評論,這應該可以:
A = pd.DataFrame([4,2,3], columns = ['Number'])
B = pd.DataFrame([2,5,6], columns = ['Number'])
a = set(A['Number'])
b = set(B['Number'])
my_set = set(a | b) #put every value in a set, so that you don't check each column twice
for i in my_set:
if i in A['Number'].values:
if i in B['Number'].values:
print(str(i) + ' is in both DataFrames')
else :
print(str(i) + ' is in A but not in B')
else: #if the value is not in A, it is obviously in B
print(str(i) + ' is in B but not in A')
輸出:
2 is in both DataFrames
3 is in A but not in B
4 is in A but not in B
5 is in B but not in A
6 is in B but not in A
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.