I have a dataframe like this:
dataA = [["A1", "t1", 5], ["A1", "t2", 8], ["A1", "t3", 7],
["A1","t4", 4], ["A1", "t5", 2], ["A1", "t6", 2],
["A2", "t1", 15], ["A2", "t2", 6], ["A2", "t3", 1],
["A2", "t4", 11], ["A2", "t5", 12], ["A2", "t6", 7],
["A3", "t1", 12], ["A3", "t2", 8], ["A3", "t3", 3],
["A3", "t4", 7], ["A3", "t5", 15], ["A3", "t6", 14]]
dataB = [["B1", "t1", 2], ["B1", "t2", 9], ["B1", "t3", 17],
["B1","t4", 14], ["B1", "t5", 32], ["B1", "t6", 3],
["B2", "t1", 44], ["B2", "t2", 36], ["B2", "t3", 51],
["B2", "t4", 81], ["B2", "t5", 82]]
data1 = pd.DataFrame(data = dataA, columns=["An", "colA", "Val"])
data2 = pd.DataFrame(data = dataB, columns=["Bm", "colA", "Val"])
How to get this result:
GroupA | GroupB| result |
---------------------------
| A1 | B1 | val_11 |
--------------------------
| A1 | B2 | val_12 |
--------------------------
| A2 | B1 | val_21 |
--------------------------
| A2 | B2 | val_22 |
--------------------------
| A3 | B1 | val_31 |
--------------------------
| A3 | B2 | val_32 |
...........................
| An | Bm | val_nm |
The way calculate val_nm as follows: val_11 is equal to the column mean value of the column value of A1 divided by the column value of B1, Note that the column A1 divided by the column B1, the corresponding number is divided by the result, if it is greater than 1, take the reciprocal , and then find the average of the result So whether A1 is divided by B1 or B1 is divided by A1, the result value must be the same.
In order to calculate val, it may be necessary to define a function, val is greater than 0, there will be no division by 0
I take val_11 as example
A1[5,8,7,4,2,2] B1[2,9,17,14,32,3]
val_11 =avg (A1/B1) =avg( 5/2 take 2/5 + 8/9 +7/17 + 4/15 +2/32 +2/3)
= 0.4525
so no matter A1/B1 or B1/A1, result will be the same
please help me caculate result
Taking the straight definition of what you want to calculate
pivot()
merge()
on a synthetic column foodef meanofdiv(dfa):
a = dfa.loc[:,[c for c in dfa.columns if "_A" in c]].values
b = dfa.loc[:,[c for c in dfa.columns if "_B" in c]].values
return np.where((a/b)>1, b/a, a/b).mean(axis=1)
# pivot key/val pair data to tables
# caretesian product of tables
# simple calculation of columns from A and a column from B
dfr = pd.merge(
data1.pivot(index="An", columns="colA", values="Val").reset_index().assign(foo=1),
data2.pivot(index="Bm", columns="colA", values="Val").reset_index().assign(foo=1),
on="foo",
suffixes=("_A","_B")
).assign(resname=lambda dfa: dfa["An"]+dfa["Bm"],
res=meanofdiv)
dfr.loc[:,["An","Bm","res"]]
An | Bm | res | |
---|---|---|---|
0 | A1 | B1 | 0.452589 |
1 | A1 | B2 | 0.202259 |
2 | A2 | B1 | 0.408018 |
3 | A2 | B2 | 0.206316 |
4 | A3 | B1 | 0.40251 |
5 | A3 | B2 | 0.172901 |
apply(axis=1)
def meanofdiv(dfa):
dfa = dfa.to_frame().T
a = dfa.loc[:,[c for c in dfa.columns if "_A" in c]].astype(float).values[0]
b = dfa.loc[:,[c for c in dfa.columns if "_B" in c]].astype(float).values[0]
a = a[~np.isnan(b)]
b = b[~np.isnan(b)]
return np.where((a/b)>1, b/a, a/b).mean()
# pivot key/val pair data to tables
# caretesian product of tables
# simple calculation of columns from A and a column from B
dfr = pd.merge(
data1.pivot(index="An", columns="colA", values="Val").reset_index().assign(foo=1),
data2.pivot(index="Bm", columns="colA", values="Val").reset_index().assign(foo=1),
on="foo",
suffixes=("_A","_B")
).assign(resname=lambda dfa: dfa["An"]+dfa["Bm"],
res=lambda dfa: dfa.apply(meanofdiv, axis=1))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.