[英]pandas groupby multiple columns with python and streamlit
I have a groupby function that i want to group multiple columns in order to plot a chart later.我有一个groupby函数,我想将多个列分组以便稍后绘制图表。 The dataframe's columns are dynamic where user select it from a selectbox
and multiselect
widgets The problem is that i am able now just to take the first or the last item from the multiselect
widget like so:数据框的列是动态的,用户从选择selectbox
和multiselect
选小部件中选择它问题是我现在可以从multiselect
选小部件中获取第一个或最后一个项目,如下所示:
some_columns_df = df.loc[:,['gender','country','city','hoby','company','status']]
some_collumns = some_columns_df.columns.tolist()
select_box_var= st.selectbox("Choose X Column",some_collumns)
multiselect_var= st.multiselect("Select Columns To GroupBy",some_collumns)
test_g3 = df.groupby([select_box_var,multiselect_var[0]]).size().reset_index(name='count')
if user select more than 1 item from the multiselect
let say he choose 4 item it becomes like below:如果用户从多选中选择超过 1 个项目, multiselect
他选择了 4 个项目,则如下所示:
test_g3 = df.groupby([select_box_var,multiselect_var[0,1,2,3]]).size().reset_index(name='count')
is this possible ?这可能吗 ?
multiselect_var
is a list while select_box_var
is a single variable. multiselect_var
是一个列表,而select_box_var
是单个变量。 Put it inside a list and add both lists together.把它放在一个列表中,然后将两个列表加在一起。
Try this:尝试这个:
test_g3 = df.groupby([select_box_var] + multiselect_var).size().reset_index(name='count')
From streamlit docs for multiselect here , the api returns a list always.从这里的多选流式文档中,api 总是返回一个列表。 And your selectbox returns a string as you have a list of strings as option.并且您的选择框返回一个字符串,因为您有一个字符串列表作为选项。
So your code can be modified to,所以你的代码可以修改为,
df.groupby([select_box_var] + multiselect_var).size().reset_index(name='count')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.