I have following dataframe
df=pd.DataFrame({'column1_T1':[1,0,0,1,1],'column1_issues': ['Comment1','abc','pqr','Comment2','Comment1'],'column2_T2':[0,0,1,0,1],'column2_issues':['OK','abc','Comment3','efg','Comment3']})
it will look like the following df
column1_T1 column1_issues column2_T2 column2_issues
1 Comment1 0 OK
0 abc 0 abc
0 pqr 1 Comment3
1 Comment2 0 efg
1 Comment1 1 Comment3
Columns with suffixes T1,T2 and so on contain either 1 or 0.
Columns with suffixes "issues" contain comments about the corresponding issues. I only have to consider 1s in columns with suffixes T1/T2 and so on, and the corresponding issues in column1_issues,column2_issues and so on.
Now I want to count the number of 1s in column1_T1,column2_T2, and unique comments in column1_issues,column2_issues corresponding to the 1s in column1_T1,column2_T2 respectively, and get it in the following format
column_labels count issue1 issue2
column1_issues 3 comment1 commen2
column2_issues 2 comment3
I have tried groupby and crosstab,but I am not able to get it
df3=df.groupby(['column1_T1', 'column1_issues'])['column1_T1'].count().unstack().fillna(0)
df3['Total'] =df3.loc[[1]].sum(axis=1)
but this is far from what I want. I am really stuck here.
I want my final dataframe in the following format as mentioned above in the following format
column_labels count issue1 issue2
column1_issues 3 comment1 commen2
column2_issues 2 comment3
First filter first column for each group for 1
values, get second column for 2 column DataFrame
:
df1 = pd.concat([x.iloc[x.iloc[:, 0].values == 1,1].to_frame('issues').assign(lab=x.columns[1])
for i, x in df.groupby(lambda x: x.split('_')[0], axis=1)], ignore_index=True)
print (df1)
issues lab
0 Comment1 column1_issues
1 Comment2 column1_issues
2 Comment1 column1_issues
3 Comment3 column2_issues
4 Comment3 column2_issues
Then remove duplicates, add helper column by GroupBy.cumcount
and reshape by DataFrame.pivot
, last use DataFrame.insert
for count column by Series.value_counts
:
df2 = df1.drop_duplicates().copy()
df2['g'] = df1.groupby('lab').cumcount().add(1)
df2 = df2.pivot('lab','g','issues').add_prefix('issue')
df2.insert(0, 'count', df1['lab'].value_counts())
df2 = df2.reset_index().rename_axis(None, axis=1)
print (df2)
lab count issue1 issue2
0 column1_issues 3 Comment1 Comment2
1 column2_issues 2 Comment3 NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.