[英]Pandas has two dataframes, want the average of the divisions between each group
我有一個這樣的 dataframe:
dataA = [["A1", "t1", 5], ["A1", "t2", 8], ["A1", "t3", 7],
["A1","t4", 4], ["A1", "t5", 2], ["A1", "t6", 2],
["A2", "t1", 15], ["A2", "t2", 6], ["A2", "t3", 1],
["A2", "t4", 11], ["A2", "t5", 12], ["A2", "t6", 7],
["A3", "t1", 12], ["A3", "t2", 8], ["A3", "t3", 3],
["A3", "t4", 7], ["A3", "t5", 15], ["A3", "t6", 14]]
dataB = [["B1", "t1", 2], ["B1", "t2", 9], ["B1", "t3", 17],
["B1","t4", 14], ["B1", "t5", 32], ["B1", "t6", 3],
["B2", "t1", 44], ["B2", "t2", 36], ["B2", "t3", 51],
["B2", "t4", 81], ["B2", "t5", 82]]
data1 = pd.DataFrame(data = dataA, columns=["An", "colA", "Val"])
data2 = pd.DataFrame(data = dataB, columns=["Bm", "colA", "Val"])
如何得到這個結果:
GroupA | GroupB| result |
---------------------------
| A1 | B1 | val_11 |
--------------------------
| A1 | B2 | val_12 |
--------------------------
| A2 | B1 | val_21 |
--------------------------
| A2 | B2 | val_22 |
--------------------------
| A3 | B1 | val_31 |
--------------------------
| A3 | B2 | val_32 |
...........................
| An | Bm | val_nm |
val_nm的計算方式如下: val_11等於A1的列值除以B1的列值的列平均值,注意是A1列除以B1列,對應的數除以結果,如果大於1,取倒數,然后求結果的平均值 所以不管是A1除以B1還是B1除以A1,結果值一定是一樣的。
為了計算val,可能需要定義一個function,val大於0,就不會被0除
我以 val_11 為例
A1[5,8,7,4,2,2] B1[2,9,17,14,32,3]
val_11 =avg (A1/B1) =avg( 5/2 取 2/5 + 8/9 +7/17 + 4/15 +2/32 +2/3)
= 0.4525
所以無論A1/B1還是B1/A1,結果都是一樣的
請幫我計算結果
直接定義要計算的內容
pivot()
創建表merge()
之間做笛卡爾積def meanofdiv(dfa):
a = dfa.loc[:,[c for c in dfa.columns if "_A" in c]].values
b = dfa.loc[:,[c for c in dfa.columns if "_B" in c]].values
return np.where((a/b)>1, b/a, a/b).mean(axis=1)
# pivot key/val pair data to tables
# caretesian product of tables
# simple calculation of columns from A and a column from B
dfr = pd.merge(
data1.pivot(index="An", columns="colA", values="Val").reset_index().assign(foo=1),
data2.pivot(index="Bm", columns="colA", values="Val").reset_index().assign(foo=1),
on="foo",
suffixes=("_A","_B")
).assign(resname=lambda dfa: dfa["An"]+dfa["Bm"],
res=meanofdiv)
dfr.loc[:,["An","Bm","res"]]
一個 | Bm | 資源 | |
---|---|---|---|
0 | A1 | B1 | 0.452589 |
1個 | A1 | B2 | 0.202259 |
2個 | A2 | B1 | 0.408018 |
3個 | A2 | B2 | 0.206316 |
4個 | A3 | B1 | 0.40251 |
5個 | A3 | B2 | 0.172901 |
apply(axis=1)
def meanofdiv(dfa):
dfa = dfa.to_frame().T
a = dfa.loc[:,[c for c in dfa.columns if "_A" in c]].astype(float).values[0]
b = dfa.loc[:,[c for c in dfa.columns if "_B" in c]].astype(float).values[0]
a = a[~np.isnan(b)]
b = b[~np.isnan(b)]
return np.where((a/b)>1, b/a, a/b).mean()
# pivot key/val pair data to tables
# caretesian product of tables
# simple calculation of columns from A and a column from B
dfr = pd.merge(
data1.pivot(index="An", columns="colA", values="Val").reset_index().assign(foo=1),
data2.pivot(index="Bm", columns="colA", values="Val").reset_index().assign(foo=1),
on="foo",
suffixes=("_A","_B")
).assign(resname=lambda dfa: dfa["An"]+dfa["Bm"],
res=lambda dfa: dfa.apply(meanofdiv, axis=1))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.