简体   繁体   中英

Filter rows from a data frame based on the highest index and values from column

I have the following example: I would like to keep all the rows where ID=5 and where I have multiple rows with ID=3 I would like to keep only from them the ones with the highest index.

data = {'Profession':['Teacher', 'Banker', 'Teacher', 'Judge','lawyer','Teacher'], 'Gender':['Male','Male', 'Female', 'Male','Male','Female'],'Size':['M','M','L','S','S','M'],'ID':['5','3','3','3','5','3']} 
data2={'Profession':['Doctor', 'Scientist', 'Scientist', 'Banker','Judge','Scientist'], 'Gender':['Male','Male', 'Female','Female','Male','Male'],'Size':['L','M','L','M','L','L'],'ID':['5','3','5','3','3','3']} 
data3 = {'Profession':['Banker', 'Banker', 'Doctor', 'Doctor','lawyer','Teacher'], 'Gender':['Male','Male', 'Female', 'Female','Female','Male'],'Size':['S','M','S','M','L','S'],'ID':['5','3','3','3','5','3']} 
data4={'Profession':['Judge', 'Judge', 'Scientist', 'Banker','Judge','Scientist'], 'Gender':['Female','Female', 'Female','Female','Female','Female'],'Size':['M','S','L','S','M','S'],'ID':['3','5','3','3','5','3']}  
df =pd.DataFrame(data) 
df2=pd.DataFrame(data2)
df3=pd.DataFrame(data3)
df4=pd.DataFrame(data4)
DATA=pd.concat([df,df2,df3,df4])
DATA.reset_index(drop=True,inplace=True)
DATA

在此处输入图片说明

I want this : This is just an example. In my real data I have really huge number of rows so I would like to have a piece of code which works for larger data frames.

在此处输入图片说明

You can construct a boolean which gets the following IDs with 3 but leaves the first.

The bool is testing that

  1. the row is equal to 3
  2. the row above these true values is also equal to 3

if we look at the first few rows with a conditional column with this boolean -

  Profession  Gender Size  ID  bool_
0     Teacher    Male    M   5  False
1      Banker    Male    M   3  False <-- fulfills 1st condition but not 2nd so false.
2     Teacher  Female    L   3   True <-- fulfills condition 1 & 2
3       Judge    Male    S   3   True <-- fulfills condition 1 & 2
4      lawyer    Male    S   5  False
5     Teacher  Female    M   3  False

#df = DATA
#df['ID'] = df['ID'].astype(int)

m = df['ID'].eq(3) & df['ID'].eq(df['ID'].shift())

df_new = df[~m]

   Profession  Gender Size   ID
0     Teacher    Male    M  5.0
1      Banker    Male    M  3.0
4      lawyer    Male    S  5.0
5     Teacher  Female    M  3.0
6      Doctor    Male    L  5.0
7   Scientist    Male    M  3.0
8   Scientist  Female    L  5.0
9      Banker  Female    M  3.0
12     Banker    Male    S  5.0
13     Banker    Male    M  3.0
16     lawyer  Female    L  5.0
17    Teacher    Male    S  3.0
19      Judge  Female    S  5.0
20  Scientist  Female    L  3.0
22      Judge  Female    M  5.0
23  Scientist  Female    S  3.0

Use:

data_filtered = DATA.loc[~(DATA['ID'].ne(DATA['ID'].shift()).cumsum().duplicated() & 
                           DATA['ID'].eq('3')), :]
print(data_filtered)

   Profession  Gender Size ID
0     Teacher    Male    M  5
1      Banker    Male    M  3
4      lawyer    Male    S  5
5     Teacher  Female    M  3
6      Doctor    Male    L  5
7   Scientist    Male    M  3
8   Scientist  Female    L  5
9      Banker  Female    M  3
12     Banker    Male    S  5
13     Banker    Male    M  3
16     lawyer  Female    L  5
17    Teacher    Male    S  3
19      Judge  Female    S  5
20  Scientist  Female    L  3
22      Judge  Female    M  5
23  Scientist  Female    S  3

You could use ~m of @Manakin answer:

DATA.loc[~m, :]
 #Double boolean, filter

 DATA[DATA.ID.eq('3')&DATA.ID.shift().eq('5')|DATA.ID.eq('5')]


 Profession  Gender Size ID
0     Teacher    Male    M  5
1      Banker    Male    M  3
4      lawyer    Male    S  5
5     Teacher  Female    M  3
6      Doctor    Male    L  5
7   Scientist    Male    M  3
8   Scientist  Female    L  5
9      Banker  Female    M  3
12     Banker    Male    S  5
13     Banker    Male    M  3
16     lawyer  Female    L  5
17    Teacher    Male    S  3
19      Judge  Female    S  5
20  Scientist  Female    L  3
22      Judge  Female    M  5
23  Scientist  Female    S  3

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM