如何对 pandas 和 python 中的多列进行分组？

Question

I have a Dataframe that I want to perform a groupby with multiple columns.我有一个 Dataframe，我想用多列执行groupby 。

If I select the columns via code, it works.如果我通过代码 select 列，它可以工作。

What I want is to allow the user to select from the list of columns, and return the groupby result .我想要的是允许用户从列列表中获取 select ，并返回groupby result 。

when I add this line the system crashes and displays the below error:当我添加此行时，系统崩溃并显示以下错误：

dda = df.groupby([primary_col_pyplot, [selected_column_names__pyplot]]) \
    .size() \
    .reset_index(name="count")

error:错误：

 ValueError: Grouper and axis must be same length

Code:代码：

import pandas as pd
import streamlit as st

df = pd.DataFrame({"source_number": [11199, 11328, 11287, 32345,
                                     12342, 1232, 12342, 123244, 1235],
                   "location": ["USA", "USA", "USA", "INDIA", "INDIA",
                                "USA", "INDIA", "USA", "INDIA"],
                   "category": ["cat1", "cat2", "cat1", "cat1", "cat2",
                                "cat1", "cat2", "cat1", "cat1"],
                   })
df.head()

all_columns_names = df.columns.tolist()
primary_col_pyplot = st.selectbox("Primary Column To GroupBy", all_columns_names)
selected_column_names__pyplot = st.multiselect("Select Columns", all_columns_names)
dda = df.groupby(["category", "location", "source_number"])\
    .size()\
    .reset_index(name="count")
print(dda)

Expected Result:预期结果：

    category    location    source_number   count
0   cat1         INDIA             1235       1
1   cat1         INDIA             32345      1
2   cat1         USA               1232       1
3   cat1         USA               11199      1
4   cat1         USA               11287      1
5   cat1         USA               123244     1
6   cat2         INDIA             12342      2
7   cat2         USA               11328      1

Answer 1

After looking up streamlit I will assume, that your st.selectbox provides only one string (one column to select).在查找 streamlit 之后，我将假设您的 st.selectbox 仅提供一个字符串（要选择的一列）。 And st.multiselect provides a list. st.multiselect 提供了一个列表。 (multiple columns to select). （多列可供选择）。 But if this might be incorrect, please try to debug it, and look at the values of primary_col_pyplot and selected_column_names__pyplot in different scenarios, to be sure.但如果这可能不正确，请尝试调试它，并查看不同场景下的 primary_col_pyplot 和 selected_column_names__pyplot 的值，以确定。

So we have one string and one list of strings.所以我们有一个字符串和一个字符串列表。 Then the concatenated groupby would work as:然后连接的 groupby 将作为：

dda = df.groupby([primary_col_pyplot] + selected_column_names__pyplot) \
    .size() \
    .reset_index(name="count")

如何对 pandas 和 python 中的多列进行分组？

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-04-26 21:18:03

如何对 pandas 和 python 中的多列进行分组？

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-04-26 21:18:03

解决方案1
0 已采纳 2021-04-26 21:18:03