简体   繁体   中英

select elements with only 1 entry in a pandas multi-index dataframe

I have the following dataframe df

import pandas as pd
df = pd.DataFrame([[1, 1, 2, 2, 2, 3,4,5,5,5,6,6,6,6], 
                   list('AABBBCDEEEFFFF'), 
                   [1, 2, 3, 4, 5, 6,7,8,9,10,11,12,13,14],
                   [1, 2, 3, 4, 5, 6,7,8,9,11,12,11,11,11]]).T
df.columns = ['col1','col2','col3','col4']

df
Out[4]: 
   col1 col2 col3 col4
0     1    A    1    1
1     1    A    2    2
2     2    B    3    3
3     2    B    4    4
4     2    B    5    5
5     3    C    6    6
6     4    D    7    7
7     5    E    8    8
8     5    E    9    9
9     5    E   10   11
10    6    F   11   12
11    6    F   12   11
12    6    F   13   11
13    6    F   14   11

that I group according to its columns in the following order

df.groupby(['col1','col2','col3']).size()

Out[7]: 
col1  col2  col3
1     A     1       1
            2       1
2     B     3       1
            4       1
            5       1
3     C     6       1
4     D     7       1
5     E     8       1
            9       1
            10      1
6     F     11      1
            12      1
            13      1
            14      1

How can I extract the value of col3 for the groups that have only one entry?

df_return
Out[4]: 
   col3
0     6
1     7

You can do this by passing col1 and col2 to .groupby , and then using .filter to select groups where the length (ie the size) equals 1.

df_return = df.groupby(['col1','col2']).filter(lambda x: len(x) == 1)['col3']

print df_return
# 5    6
# 6    7

Not sure why we need group here, using duplicated

df[~df.duplicated(('col1','col2'),keep=False)]
Out[1352]: 
  col1 col2 col3 col4
5    3    C    6    6
6    4    D    7    7

df.loc[~df.duplicated(('col1','col2'),keep=False),'col3']
Out[1353]: 
5    6
6    7
Name: col3, dtype: object

Or drop_duplicates

df.drop_duplicates(['col1','col2'],keep=False).col3
Out[1355]: 
5    6
6    7
Name: col3, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM