简体   繁体   中英

Pandas dataframe slice by sort result

Lets say I have dataframe like this:

df = pd.DataFrame([[1.1, 1.1, 1.1, 2.6, 2.5, 3.4,2.6,2.6,3.4,3.4,2.6,1.1,1.1,3.3], list('AAABBBBABCBDDD'), [1.1, 1.7, 2.5, 2.6, 3.3, 3.8,4.0,4.2,4.3,4.5,4.6,4.7,4.7,4.8], ['x/y/z','x/y','x/y/z/n','x/u','x','x/u/v','x/y/z','x','x/u/v/b','-','x/y','x/y/z','x','x/u/v/w'],['1','3','3','2','4','2','5','3','6','3','5','1','1','1']]).T
df.columns = ['col1','col2','col3','col4','col5']

df:

   col1 col2 col3     col4 col5
0   1.1    A  1.1    x/y/z    1
1   1.1    A  1.7      x/y    3
2   1.1    A  2.5  x/y/z/n    3
3   2.6    B  2.6      x/u    2
4   2.5    B  3.3        x    4
5   3.4    B  3.8    x/u/v    2
6   2.6    B    4    x/y/z    5
7   2.6    A  4.2        x    3
8   3.4    B  4.3  x/u/v/b    6
9   3.4    C  4.5        -    3
10  2.6    B  4.6      x/y    5
11  1.1    D  4.7    x/y/z    1
12  1.1    D  4.7        x    1
13  3.3    D  4.8  x/u/v/w    1

I want to get the following output

t = df.groupby('col5').col1.size()
t.sort()
t[-3:] 

Out Put:

col5
5       2
1       4
3       4

Then I want to get the 'col1' values corresponding to 'col5' values. I can get one by one like following.

df[df['col5'] == '5']['col1'].unique()

But I want to get all all three(or n numbers) at once. Is it possible? How to do this?

Use isin to filter in the values you want:

In [34]: df[df.col5.isin(t[-3:].index)]['col1'].unique()
Out[34]: array([1.1, 2.6, 3.4, 3.3], dtype=object)

I'm not 100% sure I understand what you want (which 3 numbers do you need?), but you may want to look into the groups parameter of the groupby object:

In [398]: df.groupby('col5').groups

This returns the group keys and the indexes of the groups

Out[398]: 
{'1': [0L, 11L, 12L, 13L],
 '2': [3L, 5L],
 '3': [1L, 2L, 7L, 9L],
 '4': [4L],
 '5': [6L, 10L],
 '6': [8L]}

from that result you can build any output you want:

In [399]: {col5:df.lookup(ix_list,["col1"]*len(ix_list)) for col5, ix_list in df.groupby('col5').groups.iteritems()}
Out[399]: 
{'1': array([ 1.1,  1.1,  1.1,  3.3]),
 '2': array([ 2.6,  3.4]),
 '3': array([ 1.1,  1.1,  2.6,  3.4]),
 '4': array([ 2.5]),
 '5': array([ 2.6,  2.6]),
 '6': array([ 3.4])}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM