简体   繁体   English

使用 Pandas 求解多个线性方程

[英]Solving multiple linear equations using Pandas

I have what I think is a very interesting problem here but have little idea how I can go about solving it computationally or whether a Python dataframe is appropriate for this purpose.我认为这里有一个非常有趣的问题,但不知道如何通过计算来解决它,或者 Python 数据框是否适合于此目的。 I have data like so:我有这样的数据:

    SuperGroup   Group  Code  Weight Income
8   E1           E012   a     0.5    1000
9   E1           E012   b     0.2    1000
10  E1           E013   b     0.2    1000
11  E1           E013   c     0.3    1000

Effectively, 'Code' has a one-to-one relationship with 'Weight'.实际上,“代码”与“重量”具有一对一的关系。

'SuperGroup' has a one-to-one relationship with 'Income'. “SuperGroup”与“Income”是一对一的关系。

A SuperGroup is composed of many Groups and a Group has many Codes.一个 SuperGroup 由很多 Group 组成,一个 Group 有很多 Code。

I am attempting to distribute the income according to the combined weights of codes within that group so for E012 this is (0.5*0.2 = 0.1) and for E013 this is (0.2*0.3 = 0.06) As a proportion of their total, E012s becomes 0.625 (0.1/(0.1+0.06) and E013s becomes 0.375 (0.06/(0.1+0.06) .我试图根据该组内代码的组合权重分配收入,因此对于 E012,这是(0.5*0.2 = 0.1) ,对于 E013,这是(0.2*0.3 = 0.06)作为其总数的一部分,E012s 变为0.625 (0.1/(0.1+0.06)和 E013s 变为 0.375 (0.06/(0.1+0.06)

The dataframe can be collapsed and re-written as:数据框可以折叠并重写为:

    SuperGroup   Group  Code  CombinedWeight Income
8   E1           E012   a,b   0.625          1000
10  E1           E013   b,c   0.375          1000

I am capable of producing the above dataframe, but my next step is to apply the weights to the income to distribute it in such a way that it averages to 1000 still but reflects the size of the weight of the group it is associated with.我能够生成上述数据框,但我的下一步是将权重应用于收入以使其平均为 1000 的方式分配它,但反映了与之相关的组的权重大小。

Letting x=0.625 and y=0.375 then x=1.67yx=0.625 and y=0.375 then x=1.67y

Additionally, (x+y)/2 = 1000 note: my data often has several groups present in a supergroup so it could be more than 2 resulting in a system of linear equations if my understanding is correct另外, (x+y)/2 = 1000注意:我的数据通常在一个超群中有几个群,所以如果我的理解是正确的,它可能会超过 2 个,从而产生一个线性方程组

Solving simultaneously produces 1250 and 750 as the weighted incomes.求解同时产生 1250 和 750 作为加权收入。 The dataframe can be re-written as:数据帧可以重写为:

    SuperGroup   Group  Code  Income
8   E1           E012   a,b   1250
10  E1           E013   b,c   750

which is effectively how I need it.这实际上是我需要它的方式。 Any guidance is warmly appreciated.任何指导都受到热烈赞赏。

First we agg the DataFrame on ['SuperGroup', 'Group']首先,我们agg对数据帧['SuperGroup', 'Group']

res = (df.groupby(['SuperGroup', 'Group'])
          .agg({'Weight': lambda x: x.cumprod().iloc[-1],
                'Code': ','.join,
                'Income': 'first'}))

Then we re-adjust the Income within each SuperGroup with the help of transform :然后我们在transform的帮助下重新调整每个 SuperGroup 内的 Income:

s = res.groupby(level='SuperGroup')
res['Income'] = s.Income.transform('sum')*res.Weight/s.Weight.transform('sum')

                  Weight Code  Income
SuperGroup Group                     
E1         E012     0.10  a,b  1250.0
           E013     0.06  b,c   750.0

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM