[英]Solving multiple linear equations using Pandas
I have what I think is a very interesting problem here but have little idea how I can go about solving it computationally or whether a Python dataframe is appropriate for this purpose.我认为这里有一个非常有趣的问题,但不知道如何通过计算来解决它,或者 Python 数据框是否适合于此目的。 I have data like so:
我有这样的数据:
SuperGroup Group Code Weight Income
8 E1 E012 a 0.5 1000
9 E1 E012 b 0.2 1000
10 E1 E013 b 0.2 1000
11 E1 E013 c 0.3 1000
Effectively, 'Code' has a one-to-one relationship with 'Weight'.实际上,“代码”与“重量”具有一对一的关系。
'SuperGroup' has a one-to-one relationship with 'Income'. “SuperGroup”与“Income”是一对一的关系。
A SuperGroup is composed of many Groups and a Group has many Codes.一个 SuperGroup 由很多 Group 组成,一个 Group 有很多 Code。
I am attempting to distribute the income according to the combined weights of codes within that group so for E012 this is (0.5*0.2 = 0.1)
and for E013 this is (0.2*0.3 = 0.06)
As a proportion of their total, E012s becomes 0.625 (0.1/(0.1+0.06)
and E013s becomes 0.375 (0.06/(0.1+0.06)
.我试图根据该组内代码的组合权重分配收入,因此对于 E012,这是
(0.5*0.2 = 0.1)
,对于 E013,这是(0.2*0.3 = 0.06)
作为其总数的一部分,E012s 变为0.625 (0.1/(0.1+0.06)
和 E013s 变为 0.375 (0.06/(0.1+0.06)
。
The dataframe can be collapsed and re-written as:数据框可以折叠并重写为:
SuperGroup Group Code CombinedWeight Income
8 E1 E012 a,b 0.625 1000
10 E1 E013 b,c 0.375 1000
I am capable of producing the above dataframe, but my next step is to apply the weights to the income to distribute it in such a way that it averages to 1000 still but reflects the size of the weight of the group it is associated with.我能够生成上述数据框,但我的下一步是将权重应用于收入以使其平均为 1000 的方式分配它,但反映了与之相关的组的权重大小。
Letting x=0.625 and y=0.375 then x=1.67y
让
x=0.625 and y=0.375 then x=1.67y
Additionally, (x+y)/2 = 1000
note: my data often has several groups present in a supergroup so it could be more than 2 resulting in a system of linear equations if my understanding is correct另外,
(x+y)/2 = 1000
注意:我的数据通常在一个超群中有几个群,所以如果我的理解是正确的,它可能会超过 2 个,从而产生一个线性方程组
Solving simultaneously produces 1250 and 750 as the weighted incomes.求解同时产生 1250 和 750 作为加权收入。 The dataframe can be re-written as:
数据帧可以重写为:
SuperGroup Group Code Income
8 E1 E012 a,b 1250
10 E1 E013 b,c 750
which is effectively how I need it.这实际上是我需要它的方式。 Any guidance is warmly appreciated.
任何指导都受到热烈赞赏。
First we agg
the DataFrame on ['SuperGroup', 'Group']
首先,我们
agg
对数据帧['SuperGroup', 'Group']
res = (df.groupby(['SuperGroup', 'Group'])
.agg({'Weight': lambda x: x.cumprod().iloc[-1],
'Code': ','.join,
'Income': 'first'}))
Then we re-adjust the Income within each SuperGroup with the help of transform
:然后我们在
transform
的帮助下重新调整每个 SuperGroup 内的 Income:
s = res.groupby(level='SuperGroup')
res['Income'] = s.Income.transform('sum')*res.Weight/s.Weight.transform('sum')
Weight Code Income
SuperGroup Group
E1 E012 0.10 a,b 1250.0
E013 0.06 b,c 750.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.