Pandas has two dataframes, want the average of the divisions between each group

Question

I have a dataframe like this:

dataA = [["A1", "t1", 5], ["A1", "t2", 8], ["A1", "t3", 7],
    ["A1","t4", 4], ["A1", "t5", 2], ["A1", "t6", 2],
    ["A2", "t1", 15], ["A2", "t2", 6], ["A2", "t3", 1], 
    ["A2", "t4", 11], ["A2", "t5", 12], ["A2", "t6", 7], 
    ["A3", "t1", 12], ["A3", "t2", 8], ["A3", "t3", 3], 
    ["A3", "t4", 7], ["A3", "t5", 15], ["A3", "t6", 14]]

dataB = [["B1", "t1", 2], ["B1", "t2", 9], ["B1", "t3", 17],
    ["B1","t4", 14], ["B1", "t5", 32], ["B1", "t6", 3],
    ["B2", "t1", 44], ["B2", "t2", 36], ["B2", "t3", 51], 
    ["B2", "t4", 81], ["B2", "t5", 82]]

data1 = pd.DataFrame(data = dataA, columns=["An", "colA", "Val"])

data2 = pd.DataFrame(data = dataB, columns=["Bm", "colA", "Val"])

How to get this result:

 GroupA | GroupB| result | 
 ---------------------------
| A1    | B1    | val_11 | 
 --------------------------
| A1    | B2    | val_12 | 
 --------------------------
| A2    | B1    | val_21 | 
 --------------------------
| A2    | B2    | val_22 | 
 --------------------------
| A3    | B1    | val_31 |
 --------------------------
| A3    | B2    | val_32 |

...........................

| An    | Bm    | val_nm  |

The way calculate val_nm as follows: val_11 is equal to the column mean value of the column value of A1 divided by the column value of B1, Note that the column A1 divided by the column B1, the corresponding number is divided by the result, if it is greater than 1, take the reciprocal , and then find the average of the result So whether A1 is divided by B1 or B1 is divided by A1, the result value must be the same.

In order to calculate val, it may be necessary to define a function, val is greater than 0, there will be no division by 0

I take val_11 as example

A1[5,8,7,4,2,2] B1[2,9,17,14,32,3]

val_11 =avg (A1/B1) =avg( 5/2 take 2/5 + 8/9 +7/17 + 4/15 +2/32 +2/3)

= 0.4525

so no matter A1/B1 or B1/A1, result will be the same

please help me caculate result

Answer 1

Taking the straight definition of what you want to calculate

shape data frames first, data is key / value pairs, create tables using pivot()
do a Cartesian product between the two tables merge() on a synthetic column foo
complete calculation you specified
filter down columns to get to your required output

def meanofdiv(dfa):
    a = dfa.loc[:,[c for c in dfa.columns if "_A" in c]].values 
    b = dfa.loc[:,[c for c in dfa.columns if "_B" in c]].values
    return np.where((a/b)>1, b/a, a/b).mean(axis=1)

# pivot key/val pair data to tables
# caretesian product of tables
# simple calculation of columns from A and a column from B
dfr = pd.merge(
    data1.pivot(index="An", columns="colA", values="Val").reset_index().assign(foo=1),
    data2.pivot(index="Bm", columns="colA", values="Val").reset_index().assign(foo=1),
    on="foo",
    suffixes=("_A","_B")
).assign(resname=lambda dfa: dfa["An"]+dfa["Bm"],
        res=meanofdiv)

dfr.loc[:,["An","Bm","res"]]

	An	Bm	res
0	A1	B1	0.452589
1	A1	B2	0.202259
2	A2	B1	0.408018
3	A2	B2	0.206316
4	A3	B1	0.40251
5	A3	B2	0.172901

ragged data sets

this deals with A and B sets being different lengths and stopping calc at last B observation
changed to be row by row apply(axis=1)
modify arrays to be same size by looking at NaN in B

def meanofdiv(dfa):
    dfa = dfa.to_frame().T
    a = dfa.loc[:,[c for c in dfa.columns if "_A" in c]].astype(float).values[0] 
    b = dfa.loc[:,[c for c in dfa.columns if "_B" in c]].astype(float).values[0]
    a = a[~np.isnan(b)]
    b = b[~np.isnan(b)]
    return np.where((a/b)>1, b/a, a/b).mean()

# pivot key/val pair data to tables
# caretesian product of tables
# simple calculation of columns from A and a column from B
dfr = pd.merge(
    data1.pivot(index="An", columns="colA", values="Val").reset_index().assign(foo=1),
    data2.pivot(index="Bm", columns="colA", values="Val").reset_index().assign(foo=1),
    on="foo",
    suffixes=("_A","_B")
).assign(resname=lambda dfa: dfa["An"]+dfa["Bm"],
        res=lambda dfa: dfa.apply(meanofdiv, axis=1))

Pandas has two dataframes, want the average of the divisions between each group

Question

1 answers

solution1
1 ACCPTED 2021-02-18 09:06:05

ragged data sets

Pandas has two dataframes, want the average of the divisions between each group

Question

1 answers

solution1 1 ACCPTED 2021-02-18 09:06:05

ragged data sets

solution1
1 ACCPTED 2021-02-18 09:06:05