我可以用groupy和pandas计算百分比吗

Question

I have 2 questions: First, i have this data-frame:我有两个问题：首先，我有这个数据框：

data = {'Name':['A', 'B', 'C', 'A','D','E','A','C','A','A','A'], 'Family':['B1','B','B','B3','B','B','B','B1','B','B3','B'],
       'Region':['North', 'South', 'East', 'West','South', 'East', 'West','North','East', 'West','South'], 
        'Cod':['1','2','2','1','5','1','1','1','2','1','3'], 'Customer number':['A111','A223','A555','A333','A333','A444','A222','A111','A222','A333','A221']
        ,'Sales':[100,134,53,34,244,789,213,431,0,55,23]}

and i would like to have a column which returns a percentage of sales in a groupby of the other columns, like in the image below:我想要一个列，它返回其他列的 groupby 中的销售额百分比，如下图所示：

Second point is, if the percentage is 0% (like in the first row) i would like to use the same result based on a criterion, for example(if A222 is 0% use the result of A221).第二点是，如果百分比为 0%（如第一行），我想根据标准使用相同的结果，例如（如果 A222 为 0%，则使用 A221 的结果）。

Answer 1

I think this is what you want:我认为这就是你想要的：

import pandas as pd
df = pd.DataFrame(data)
granular_sum_df = df.groupby(['Name', 'Family', 'Region', 'Cod', 'Customer number'])['Sales'].sum().reset_index()
family_sum_df = df.groupby(['Name', 'Family'])['Sales'].sum().reset_index()
final_df = granular_sum_df.merge(family_sum_df, on=['Name', 'Family'])
final_df['Pct'] = final_df['Sales_x']/final_df['Sales_y']

Answer 2

Well answer for question one could be:问题一的答案可能是：

#step  1 Import pandas
import pandas as pd

df=pd.DataFrame(data)

# step 2 printing the dataframe
df

# step 3 Calculating the pecentage:


df['percentage of sales'] = (df['Sales'] / df['Sales'].sum())*100



# step 4 :joining this table percentage to the main dataframe
pd.concat([df, df[['percentage of sales ']]], axis=1, sort=False)

Answer for question 2: its depends, what is the condition you want to do.问题 2 的答案：这取决于您要执行的条件是什么。

assumming the logic :假设逻辑：

that is one way ,这是一种方式，

but the easy way to answer question 1 and 2 is to convert dataframe into a numpy array then do the operation , and then bring it back to dataframe.但回答问题 1 和 2 的简单方法是将数据帧转换为 numpy 数组，然后执行操作，然后将其带回数据帧。 1 check this answers: Add column for percentage of total to Pandas dataframe 1 检查此答案：将占总数百分比的列添加到 Pandas 数据框

#Converting the percentage column to numpy array
npprices=df['percentage'].to_numpy()
npprices
#loop through the rows and fill the row next row with value from previous row, ASSUMING previous row is not zero.

 for i in range(len(npprices)):
  if npprices[i]==0:
  npprices[i]=npprices[i-1]

  #converting in to dataframe back
  percentage1=pd.DataFrame({'percentage2':npprices})

  # the joing this percentage row to to dataframe

  df2i=pd.concat([df, percentage1[['percentage2']]], axis=1, sort=False)

NOTE I added it twice, by mistake.注意我错误地添加了两次。 But of course, there could be other easier approach, I hope this helps但当然，可能还有其他更简单的方法，我希望这会有所帮助

Some answers: I used:一些答案：我用过：

Creating a Pandas DataFrame from a Numpy array: How do I specify the index column and column headers? 从 Numpy 数组创建 Pandas DataFrame：如何指定索引列和列标题？

我可以用groupy和pandas计算百分比吗

问题描述

2 个解决方案

解决方案1
0 2020-11-01 18:52:50

解决方案2
0 已采纳 2020-11-01 20:21:03

我可以用groupy和pandas计算百分比吗

问题描述

2 个解决方案

解决方案1 0 2020-11-01 18:52:50

解决方案2 0 已采纳 2020-11-01 20:21:03

解决方案1
0 2020-11-01 18:52:50

解决方案2
0 已采纳 2020-11-01 20:21:03