python pandas dataframe 聚合行

Question

I have a dataframe like so.我有一个像这样的 dataframe。

   id   K  V
0   1  k1  3
1   1  k2  4
2   1  k2  5
3   1  k1  5
4   2  k1  2
5   2  k1  3
6   2  k2  3

And i also have a set of conditions like k1 > 1 and k2 < 4.而且我还有一组条件，例如 k1 > 1 和 k2 < 4。

I want to process the conditions and create a new dataframe containing 1 row per id and columns for each conditions.我想处理条件并创建一个新的 dataframe 每个 id 包含 1 行和每个条件的列。

   id  k1_condition  k2_condition
0   1  True          False
1   2  True          True

Answer 1

Try the following尝试以下

# function to be applied to the column 'V 'each (id, ki) group
def conditions(g):
    cond_dict = {
        'k1': lambda k1: k1 > 1,
        'k2': lambda k2: k2 < 4
    }
    _ , k = g.name  # g.name = group key = (id, ki)
    return cond_dict[k](g).all()

out = (
    df.groupby(['id', 'K'])['V']
      .apply(conditions) 
      .unstack('K') # turn k1 and k2 into columns 
      .add_suffix('_cond') # add suffix to column names: ki --> ki_cond
      .rename_axis(columns=None) # remove the column axis label (K)
      .reset_index() # make id a column, not the index       
)

Output: Output：

>>> out

   id  k1_cond  k2_cond
0   1     True    False
1   2     True     True

You can easily add more conditions to the cond_dict if the column K contains other values besides k1 and k2 .如果列K包含除k1和k2之外的其他值，您可以轻松地向cond_dict添加更多条件。

Answer 2

Dataframe.apply should work: Dataframe.apply 应该可以工作：

df["k1_condition"] = df.apply(lambda x: x["K"]=="k1" & x["V"]>1, axis=1)
df["k2_condition"] = df.apply(lambda x: x["K"]=="k2" & x["V"]>4, axis=1)
df2 = df[["id", "k1_condition", "k2_condition"]].groupy("id").any()

Answer 3

here is one way to do it这是一种方法

df2=df.pivot_table(index='id',columns='K', values='V', aggfunc=['min','max']).reset_index()
df2.columns = ['_'.join(col) for col in df2.columns ]
df2['k1_condition'] = df2['min_k1'] > 1
df2['k2_condition'] = df2['max_k2'] <4
df2=df2.drop(columns=['min_k1','min_k2','max_k1','max_k2'])
df2

OR或者

df2=df.pivot_table(index='id',columns='K', values='V', aggfunc=['min','max']).reset_index()
df2['k1_condition'] = df2['min']['k1'] > 1
df2['k2_condition'] = df2['max']['k2'] <4
df2.drop(columns=['min','max'],level=0,inplace=True)
df2


    id_     k1_condition    k2_condition
0   1       True            False
1   2       True            True

Answer 4

You could use pivot_table with a conditions function.您可以使用带有条件 function 的 pivot_table。

def conditions(x):
    k = df.at[x.index[0],'K']

    if k == 'k1':
        return (x>1).all()
    
    return (x<4).all()

pd.pivot_table(df, index='id', columns='K', aggfunc=conditions) \
    .droplevel(level=0, axis=1).add_suffix('_condition') \
    .rename_axis(None, axis=1).reset_index()

Result结果

   id   k1_conditon  k2_condition
0   1          True         False
1   2          True          True

python pandas dataframe 聚合行

问题描述

4 个解决方案

解决方案1
2 2022-07-02 17:01:15

解决方案2
1 2022-07-02 15:37:24

解决方案3
1 2022-07-02 15:48:00

解决方案4
1 2022-07-02 16:21:22

python pandas dataframe 聚合行

问题描述

4 个解决方案

解决方案1 2 2022-07-02 17:01:15

解决方案2 1 2022-07-02 15:37:24

解决方案3 1 2022-07-02 15:48:00

解决方案4 1 2022-07-02 16:21:22

解决方案1
2 2022-07-02 17:01:15

解决方案2
1 2022-07-02 15:37:24

解决方案3
1 2022-07-02 15:48:00

解决方案4
1 2022-07-02 16:21:22