简体   繁体   English

如何在递归 function 中构建 pandas dataframe?

[英]How to build a pandas dataframe in a recursive function?

I am trying to implement the 'Bottom-Up Computation' algorithm in data mining ( https://www.aaai.org/Papers/FLAIRS/2003/Flairs03-050.pdf ).我正在尝试在数据挖掘中实现“自下而上计算”算法( https://www.aaai.org/Papers/FLAIRS/2003/Flairs03-050.pdf )。

I need to use the 'pandas' library to create a dataframe and provide it to a recursive function, which should also return a dataframe as output. I need to use the 'pandas' library to create a dataframe and provide it to a recursive function, which should also return a dataframe as output. I am only able to return the final column as output, because I am unable to figure out how to dynamically build a data frame.我只能将最后一列返回为 output,因为我无法弄清楚如何动态构建数据框。

Here is the python program:这是 python 程序:

import pandas as pd

def project_data(df, d):
    return df.iloc[:, d]

def select_data(df, d, val):
    col_name = df.columns[d]
    return df[df[col_name] == val]

def remove_first_dim(df):
    return df.iloc[:, 1:]

def slice_data_dim0(df, v):
    df_temp = select_data(df, 0, v)
    return remove_first_dim(df_temp)

def buc(df):
    dims = df.shape[1]
    if dims == 1:
        input_sum = sum(project_data(df, 0) )
        print(input_sum)
    else:
        dim_vals = set(project_data(df, 0).values)

        for dim_val in dim_vals:
            sub_data = slice_data_dim0(df, dim_val)
            buc(sub_data)
        sub_data = remove_first_dim(df)
        buc(sub_data)


data = {'A':[1,1,1,1,2],
        'B':[1,1,2,3,1],
        'M':[10,20,30,40,50]
        }
    
df = pd.DataFrame(data, columns = ['A','B','M'])
buc(df)

I get the following output:我得到以下 output:

30
30
40
100
50
50
80
30
40

But what I need is a dataframe, like this (not necessarily formatted, but a data frame):但是我需要的是一个dataframe,像这样(不一定是格式化的,而是一个数据框):

    A   B   M
0   1   1   30
1   1   2   30
2   1   3   40
3   1   ALL 100
4   2   1   50
5   2   ALL 50
6   ALL 1   80
7   ALL 2   30
8   ALL 3   40
9   ALL ALL 150

How do I achieve this?我如何实现这一目标?

Unfortunately pandas doesn't have functionality to do subtotals - so the trick is to just calculate them on the side and concatenate together with original dataframe.不幸的是, pandas没有进行小计的功能 - 所以诀窍是只计算它们并与原始 dataframe 连接在一起。

from itertools import combinations
import numpy as np

dim = ['A', 'B']
vals = ['M']

df = pd.concat(
    [df]
# subtotals:
    + [df.groupby(list(gr), as_index=False)[vals].sum() for r in range(len(dim)-1) for gr in combinations(dim, r+1)]
# total:
    + [df.groupby(np.zeros(len(df)))[vals].sum()]
    )\
    .sort_values(dim)\
    .reset_index(drop=True)\
    .fillna("ALL")

Output: Output:

      A    B    M
0     1    1   10
1     1    1   20
2     1    2   30
3     1    3   40
4     1  ALL  100
5     2    1   50
6     2  ALL   50
7   ALL    1   80
8   ALL    2   30
9   ALL    3   40
10  ALL  ALL  150

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM