Hi I have the dataframe
df_warnings
which captures the warnings from a server log, and looks like the below (first 3 rows shown):
URI code method tid type
date
2017-06-20 URI: /app/faces/pages/oversight/Oversight.jspx ADFC-64001 oracle.adfinternal.controller.state.ControllerState tid: [ACTIVE].ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)' WARNING
2017-06-20 URI: /app/faces/pages/oversight/Oversight.jspx ADFC-64001 oracle.adfinternal.controller.state.ControllerState tid: [ACTIVE].ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)' WARNING
2017-06-20 URI: /app/faces/pages/oversight/Oversight.jspx ADFC-64001 oracle.adfinternal.controller.state.ControllerState tid: [ACTIVE].ExecuteThread: '2' for queue: 'weblogic.kernel.Default (self-tuning)' WARNING
The 'code' & 'method' columns are strings. What I would like to do is :
group the 'method' values by the 'code' value (ie I would like to see the methods and the counts of those methods against each code)
group the counts of each method within each code group in descending order
order the groups (codes) in descending order
only show the top 3 methods & counts in each code group
What is the best way to do this?
EDIT: I have tried
df_warnings['method'].groupby(df_warnings['code']).value_counts()
Which gives me the methods & method counts binned by code; however it doesnt give me the top 3 methods & methods counts in each code bin, and code bins are not ordered in descending order of total count in the bin
EDIT2: output I would like
code method count code1 A 100 B 50 C 5 D 2 code2 A 50 B 10 code3 C 50 D 5
In the above code1 code2 and code3 are sorted in terms of total count in each (157, 60 and 55 respectively) and then the methods & counts are sorted within each group
Thanks in advance!
I think you need groupby
+ value_counts
for count and then SeriesGroupBy.nlargest
:
d = {'method': ['C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'D', 'D', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'D', 'D'], 'code': ['code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3']}
df = pd.DataFrame(d)
print (df.head())
code method
0 code1 C
1 code1 C
2 code1 C
3 code1 C
4 code1 C
df2 = df.groupby(['code'])['method'].value_counts()
print (df2)
code method
code1 C 100
A 50
B 5
D 2
code2 C 50
A 10
code3 C 50
D 5
Name: method, dtype: int64
df2 = df.groupby(['code'])['method'].value_counts().sort_index()
print (df2)
code method
code1 A 50
B 5
C 100
D 2
code2 A 10
C 50
code3 C 50
D 5
Name: method, dtype: int64
#in real data change 2 to 3
df2 = df2.groupby(level='code',group_keys=False ).nlargest(2)
print (df2)
code method
code1 C 100
A 50
code2 C 50
A 10
code3 C 50
D 5
Name: method, dtype: int64
EDIT:
I try a bit explain sort_values
by samples (I think this answer it explain better, although it is not pandas.):
d = {'method': ['C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'D', 'D', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'C', 'D', 'D', 'D', 'D', 'D'], 'code': ['code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code1', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code2', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3', 'code3']}
df = pd.DataFrame(d)
#print (df.head())
df3 = df.groupby(['code'])['method'].value_counts().reset_index(name='vals')
#some random shuffle of rows
a = df3.index.values
np.random.seed(88)
np.random.shuffle(a)
df3 = df3.reindex(a).sort_index()
print (df3)
code method vals
0 code3 D 5
1 code2 A 10
2 code2 C 50
3 code1 A 50
4 code1 C 100
5 code1 B 5
6 code1 D 2
7 code3 C 50
print (df3.sort_values(['code']))
code method vals
3 code1 A 50
4 code1 C 100
5 code1 B 5
6 code1 D 2
1 code2 A 10
2 code2 C 50
0 code3 D 5
7 code3 C 50
print (df3.sort_values(['method']))
code method vals
1 code2 A 10
3 code1 A 50
5 code1 B 5
2 code2 C 50
4 code1 C 100
7 code3 C 50
0 code3 D 5
6 code1 D 2
print (df3.sort_values(['vals'], ascending=False))
code method vals
4 code1 C 100
2 code2 C 50
3 code1 A 50
7 code3 C 50
1 code2 A 10
0 code3 D 5
5 code1 B 5
6 code1 D 2
#if sorting by multiples columns it sort all columns separately:
#so first sort all values in df by first column, then sort by second and last by 3. col
print (df3.sort_values(['code','method']))
code method vals
3 code1 A 50
5 code1 B 5
4 code1 C 100
6 code1 D 2
1 code2 A 10
2 code2 C 50
7 code3 C 50
0 code3 D 5
print (df3.sort_values(['code','vals'], ascending=[True, False]))
code method vals
4 code1 C 100
3 code1 A 50
5 code1 B 5
6 code1 D 2
2 code2 C 50
1 code2 A 10
7 code3 C 50
0 code3 D 5
print (df3.sort_values(['method', 'vals'], ascending=[True, False]))
code method vals
3 code1 A 50
1 code2 A 10
5 code1 B 5
4 code1 C 100
2 code2 C 50
7 code3 C 50
0 code3 D 5
6 code1 D 2
print (df3.sort_values(['vals', 'method'], ascending=[False, True]))
code method vals
4 code1 C 100
3 code1 A 50
2 code2 C 50
7 code3 C 50
1 code2 A 10
5 code1 B 5
0 code3 D 5
6 code1 D 2
print (df3.sort_values(['vals', 'method', 'code'], ascending=[True, False, False]))
code method vals
6 code1 D 2
0 code3 D 5
5 code1 B 5
1 code2 A 10
7 code3 C 50
2 code2 C 50
3 code1 A 50
4 code1 C 100
print (df3.sort_values(['code', 'method', 'vals'], ascending=[True, False, True]))
code method vals
6 code1 D 2
4 code1 C 100
5 code1 B 5
3 code1 A 50
2 code2 C 50
1 code2 A 10
0 code3 D 5
7 code3 C 50
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.