[英]How to get the percentage of each value in a row basis row total in python
I have the below data: 我有以下数据:
id hours class
1 67.91 V
1 65.56 V
1 51.14 V
1 41.51 V
1 33.55 V
1 26.45 G
1 26.09 V
1 25.77 G
1 25.50 P
1 25.13 G
1 24.49 P
1 21.88 B
1 18.57 V
1 17.90 B
...
18 92.2 B
18 81.06 V
18 70.48 V
18 67.10 B
18 62.92 B
18 62.88 V
18 54.36 B
18 52.77 V
18 44.55 V
18 40.61 P
18 40.51 P
18 40.06 V
18 37.67 V
18 33.78 B
I essentially need to get the data in pivot format and calculate the total hours within each class as a percentage of the total hours for each household in the data: 我本质上需要获取数据透视表格式的数据,并计算每个类别中的总工作时间占数据中每个家庭总工作时间的百分比:
Expected Output: 预期产量:
id B G P V Total
1 8.44% 16.41% 10.60% 64.55% 100.00%
18 39.74% 0.0% 10.39% 49.87% 100.00%
Can someone please help me with this? 有人可以帮我吗? This has to be done id/row wise. 这必须在id / row明智的情况下完成。 The data is in a pandas data-frame. 数据在熊猫数据框中。
I believe you need groupby
+ sum
+ unstack
or pivot_table
for pivoting: 我相信你需要groupby
+ sum
+ unstack
或pivot_table
为枢轴:
df = df.groupby(['id','class'])['hours'].sum().unstack(fill_value=0)
df = df.pivot_table(index='id', columns='class', values='hours', aggfunc='sum', fill_value=0)
And then divide by div
sum per rows, multiple by 100
, round
and last add new column Total
by assign
with check if get 100
, thanks Paul H
for idea: 然后除以每行的div
总和,再乘以100
, round
,最后添加新列Total
按assign
,检查是否为100
,谢谢Paul H
的想法:
df = df.div(df.sum(1), 0).mul(100).round(2).assign(Total=lambda df: df.sum(axis=1))
print (df)
class B G P V Total
id
1 8.44 16.41 10.60 64.55 100.0
18 39.74 0.00 10.39 49.87 100.0
And for percentage convert to string
s and add %
: 对于百分比,请转换为string
s并添加%
:
df1 = df.astype(str) + '%'
print (df1)
class B G P V Total
id
1 8.44% 16.41% 10.6% 64.55% 100.0%
18 39.74% 0.0% 10.39% 49.87% 100.0%
Timings : 时间 :
np.random.seed(123)
N = 100000
L = list('BGPV')
df = pd.DataFrame({'class': np.random.choice(L, N),
'hours':np.random.rand(N),
'id':np.random.randint(20000, size=N)})
print (df)
def dark1(df):
ndf = df.groupby('id').apply(lambda x : x.groupby('class')['hours'].sum()/x['hours'].sum())\
.reset_index().pivot(columns='class',index='id')*100
return ndf.assign(Total=ndf.sum(1)).fillna(0)
def dark2(df):
one = df.groupby('id')['hours'].sum()
two = df.pivot_table(index='id',values='hours',columns='class',aggfunc=sum)
ndf = pd.DataFrame(two.values / one.values[:,None]*100,columns=two.columns)
return ndf.assign(Total=ndf.sum(1)).fillna(0)
def jez1(df):
df = df.groupby(['id','class'])['hours'].sum().unstack(fill_value=0)
return df.div(df.sum(1), 0).mul(100).assign(Total=lambda df: df.sum(axis=1))
def jez2(df):
df = df.pivot_table(index='id', columns='class', values='hours', aggfunc='sum', fill_value=0)
return df.div(df.sum(1), 0).mul(100).assign(Total=lambda df: df.sum(axis=1))
print (dark1(df))
print (dark2(df))
print (jez1(df))
print (jez2(df))
In [39]: %timeit (dark1(df))
1 loop, best of 3: 15.4 s per loop
In [40]: %timeit (dark2(df))
10 loops, best of 3: 52.7 ms per loop
In [41]: %timeit (jez1(df))
10 loops, best of 3: 38.8 ms per loop
In [42]: %timeit (jez2(df))
10 loops, best of 3: 44.9 ms per loop
Caveat 警告
The results do not address performance given the number of groups, which will affect timings for some of these solutions. 给定组数,结果无法解决性能问题,这将影响其中一些解决方案的时序。
Another way is to use nested groupby
ie 另一种方法是使用nested groupby
ndf = df.groupby('id').apply(lambda x : x.groupby('class')['hours'].sum()/x['hours'].sum())\
.reset_index().pivot(columns='class',index='id')*100
ndf = ndf.assign(Total=ndf.sum(1)).fillna(0)
hours Total
class B G P V
id
1 8.437798 16.40683 10.603457 64.551914 100.0
18 39.741341 0 10.387349 49.871311 100.0
Or : 要么 :
one = df.groupby('id')['hours'].sum()
two = df.pivot_table(index='id',values='hours',columns='class',aggfunc=sum)
ndf = pd.DataFrame(two.values / one.values[:,None]*100,columns=two.columns)
ndf = ndf.assign(Total=ndf.sum(1)).fillna(0)
class B G P V Total
0 8.437798 16.40683 10.603457 64.551914 100.0
1 39.741341 0.00000 10.387349 49.871311 100.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.