![](/img/trans.png)
[英]Compare row value with column name and highlight the intersecting cell in Pandas
[英]Compare 2 Pandas dataframes, row by row, cell by cell
我有 2 個數據幀, df1
和df2
,並且想要執行以下操作,將結果存儲在df3
:
for each row in df1:
for each row in df2:
create a new row in df3 (called "df1-1, df2-1" or whatever) to store results
for each cell(column) in df1:
for the cell in df2 whose column name is the same as for the cell in df1:
compare the cells (using some comparing function func(a,b) ) and,
depending on the result of the comparison, write result into the
appropriate column of the "df1-1, df2-1" row of df3)
例如,類似於:
df1
A B C D
foo bar foobar 7
gee whiz herp 10
df2
A B C D
zoo car foobar 8
df3
df1-df2 A B C D
foo-zoo func(foo,zoo) func(bar,car) func(foobar,foobar) func(7,8)
gee-zoo func(gee,zoo) func(whiz,car) func(herp,foobar) func(10,8)
我從這個開始:
for r1 in df1.iterrows():
for r2 in df2.iterrows():
for c1 in r1:
for c2 in r2:
但我不確定如何處理它,並希望得到一些幫助。
因此,要繼續在評論中進行討論,您可以使用矢量化,這是像 pandas 或 numpy 這樣的庫的賣點之一。 理想情況下,您不應該調用iterrows()
。 更明確一點我的建議:
# with df1 and df2 provided as above, an example
df3 = df1['A'] * 3 + df2['A']
# recall that df2 only has the one row so pandas will broadcast a NaN there
df3
0 foofoofoozoo
1 NaN
Name: A, dtype: object
# more generally
# we know that df1 and df2 share column names, so we can initialize df3 with those names
df3 = pd.DataFrame(columns=df1.columns)
for colName in df1:
df3[colName] = func(df1[colName], df2[colName])
現在,您甚至可以通過創建 lambda 函數然后使用列名壓縮它們來將不同的函數應用於不同的列:
# some example functions
colAFunc = lambda x, y: x + y
colBFunc = lambda x, y; x - y
....
columnFunctions = [colAFunc, colBFunc, ...]
# initialize df3 as above
df3 = pd.DataFrame(columns=df1.columns)
for func, colName in zip(columnFunctions, df1.columns):
df3[colName] = func(df1[colName], df2[colName])
唯一想到的“問題”是您需要確保您的函數適用於您的列中的數據。 例如,如果您要執行類似df1['A'] - df2['A']
(使用您提供的 df1, df2)之類的操作,則會引發ValueError
因為兩個字符串的減法未定義。 只是需要注意的事情。
編輯,回復:您的評論:這也是可行的。 遍歷就是更大,這樣你就不會碰到的dfX.columns KeyError
,並拋出一個if
語句有:
# all the other jazz
# let's say df1 is [['A', 'B', 'C']] and df2 is [['A', 'B', 'C', 'D']]
# so iterate over df2 columns
for colName in df2:
if colName not in df1:
df3[colName] = np.nan # be sure to import numpy as np
else:
df3[colName] = func(df1[colName], df2[colName])
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.