[英]How to build a pandas dataframe in a recursive function?
I am trying to implement the 'Bottom-Up Computation' algorithm in data mining ( https://www.aaai.org/Papers/FLAIRS/2003/Flairs03-050.pdf ).我正在尝试在数据挖掘中实现“自下而上计算”算法( https://www.aaai.org/Papers/FLAIRS/2003/Flairs03-050.pdf )。
I need to use the 'pandas' library to create a dataframe and provide it to a recursive function, which should also return a dataframe as output. I need to use the 'pandas' library to create a dataframe and provide it to a recursive function, which should also return a dataframe as output. I am only able to return the final column as output, because I am unable to figure out how to dynamically build a data frame.我只能将最后一列返回为 output,因为我无法弄清楚如何动态构建数据框。
Here is the python program:这是 python 程序:
import pandas as pd
def project_data(df, d):
return df.iloc[:, d]
def select_data(df, d, val):
col_name = df.columns[d]
return df[df[col_name] == val]
def remove_first_dim(df):
return df.iloc[:, 1:]
def slice_data_dim0(df, v):
df_temp = select_data(df, 0, v)
return remove_first_dim(df_temp)
def buc(df):
dims = df.shape[1]
if dims == 1:
input_sum = sum(project_data(df, 0) )
print(input_sum)
else:
dim_vals = set(project_data(df, 0).values)
for dim_val in dim_vals:
sub_data = slice_data_dim0(df, dim_val)
buc(sub_data)
sub_data = remove_first_dim(df)
buc(sub_data)
data = {'A':[1,1,1,1,2],
'B':[1,1,2,3,1],
'M':[10,20,30,40,50]
}
df = pd.DataFrame(data, columns = ['A','B','M'])
buc(df)
I get the following output:我得到以下 output:
30
30
40
100
50
50
80
30
40
But what I need is a dataframe, like this (not necessarily formatted, but a data frame):但是我需要的是一个dataframe,像这样(不一定是格式化的,而是一个数据框):
A B M
0 1 1 30
1 1 2 30
2 1 3 40
3 1 ALL 100
4 2 1 50
5 2 ALL 50
6 ALL 1 80
7 ALL 2 30
8 ALL 3 40
9 ALL ALL 150
How do I achieve this?我如何实现这一目标?
Unfortunately pandas
doesn't have functionality to do subtotals - so the trick is to just calculate them on the side and concatenate together with original dataframe.不幸的是, pandas
没有进行小计的功能 - 所以诀窍是只计算它们并与原始 dataframe 连接在一起。
from itertools import combinations
import numpy as np
dim = ['A', 'B']
vals = ['M']
df = pd.concat(
[df]
# subtotals:
+ [df.groupby(list(gr), as_index=False)[vals].sum() for r in range(len(dim)-1) for gr in combinations(dim, r+1)]
# total:
+ [df.groupby(np.zeros(len(df)))[vals].sum()]
)\
.sort_values(dim)\
.reset_index(drop=True)\
.fillna("ALL")
Output: Output:
A B M
0 1 1 10
1 1 1 20
2 1 2 30
3 1 3 40
4 1 ALL 100
5 2 1 50
6 2 ALL 50
7 ALL 1 80
8 ALL 2 30
9 ALL 3 40
10 ALL ALL 150
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.