根据列中的最高索引和值过滤数据框中的行

Question

I have the following example: I would like to keep all the rows where ID=5 and where I have multiple rows with ID=3 I would like to keep only from them the ones with the highest index.我有以下示例：我想保留ID=5所有行以及ID=3多行我只想保留索引最高的行。

data = {'Profession':['Teacher', 'Banker', 'Teacher', 'Judge','lawyer','Teacher'], 'Gender':['Male','Male', 'Female', 'Male','Male','Female'],'Size':['M','M','L','S','S','M'],'ID':['5','3','3','3','5','3']} 
data2={'Profession':['Doctor', 'Scientist', 'Scientist', 'Banker','Judge','Scientist'], 'Gender':['Male','Male', 'Female','Female','Male','Male'],'Size':['L','M','L','M','L','L'],'ID':['5','3','5','3','3','3']} 
data3 = {'Profession':['Banker', 'Banker', 'Doctor', 'Doctor','lawyer','Teacher'], 'Gender':['Male','Male', 'Female', 'Female','Female','Male'],'Size':['S','M','S','M','L','S'],'ID':['5','3','3','3','5','3']} 
data4={'Profession':['Judge', 'Judge', 'Scientist', 'Banker','Judge','Scientist'], 'Gender':['Female','Female', 'Female','Female','Female','Female'],'Size':['M','S','L','S','M','S'],'ID':['3','5','3','3','5','3']}  
df =pd.DataFrame(data) 
df2=pd.DataFrame(data2)
df3=pd.DataFrame(data3)
df4=pd.DataFrame(data4)
DATA=pd.concat([df,df2,df3,df4])
DATA.reset_index(drop=True,inplace=True)
DATA

I want this : This is just an example.我想要这个：这只是一个例子。 In my real data I have really huge number of rows so I would like to have a piece of code which works for larger data frames.在我的真实数据中，我有非常多的行，所以我想要一段适用于更大数据帧的代码。

Answer 1

You can construct a boolean which gets the following IDs with 3 but leaves the first.您可以构造一个布尔值，它使用3获取以下 ID，但保留第一个。

The bool is testing that布尔正在测试

the row is equal to 3该行等于3
the row above these true values is also equal to 3这些真值上方的行也等于3

if we look at the first few rows with a conditional column with this boolean -如果我们查看带有此布尔值的条件列的前几行 -

  Profession  Gender Size  ID  bool_
0     Teacher    Male    M   5  False
1      Banker    Male    M   3  False <-- fulfills 1st condition but not 2nd so false.
2     Teacher  Female    L   3   True <-- fulfills condition 1 & 2
3       Judge    Male    S   3   True <-- fulfills condition 1 & 2
4      lawyer    Male    S   5  False
5     Teacher  Female    M   3  False

#df = DATA
#df['ID'] = df['ID'].astype(int)

m = df['ID'].eq(3) & df['ID'].eq(df['ID'].shift())

df_new = df[~m]

   Profession  Gender Size   ID
0     Teacher    Male    M  5.0
1      Banker    Male    M  3.0
4      lawyer    Male    S  5.0
5     Teacher  Female    M  3.0
6      Doctor    Male    L  5.0
7   Scientist    Male    M  3.0
8   Scientist  Female    L  5.0
9      Banker  Female    M  3.0
12     Banker    Male    S  5.0
13     Banker    Male    M  3.0
16     lawyer  Female    L  5.0
17    Teacher    Male    S  3.0
19      Judge  Female    S  5.0
20  Scientist  Female    L  3.0
22      Judge  Female    M  5.0
23  Scientist  Female    S  3.0

Answer 2

Use:用：

data_filtered = DATA.loc[~(DATA['ID'].ne(DATA['ID'].shift()).cumsum().duplicated() & 
                           DATA['ID'].eq('3')), :]
print(data_filtered)

   Profession  Gender Size ID
0     Teacher    Male    M  5
1      Banker    Male    M  3
4      lawyer    Male    S  5
5     Teacher  Female    M  3
6      Doctor    Male    L  5
7   Scientist    Male    M  3
8   Scientist  Female    L  5
9      Banker  Female    M  3
12     Banker    Male    S  5
13     Banker    Male    M  3
16     lawyer  Female    L  5
17    Teacher    Male    S  3
19      Judge  Female    S  5
20  Scientist  Female    L  3
22      Judge  Female    M  5
23  Scientist  Female    S  3

You could use ~m of @Manakin answer:您可以使用@Manakin 的~m答案：

DATA.loc[~m, :]

Answer 3

 #Double boolean, filter

 DATA[DATA.ID.eq('3')&DATA.ID.shift().eq('5')|DATA.ID.eq('5')]


 Profession  Gender Size ID
0     Teacher    Male    M  5
1      Banker    Male    M  3
4      lawyer    Male    S  5
5     Teacher  Female    M  3
6      Doctor    Male    L  5
7   Scientist    Male    M  3
8   Scientist  Female    L  5
9      Banker  Female    M  3
12     Banker    Male    S  5
13     Banker    Male    M  3
16     lawyer  Female    L  5
17    Teacher    Male    S  3
19      Judge  Female    S  5
20  Scientist  Female    L  3
22      Judge  Female    M  5
23  Scientist  Female    S  3

根据列中的最高索引和值过滤数据框中的行

问题描述

3 个解决方案

解决方案1
3 已采纳 2020-10-20 08:52:22

解决方案2
2 2020-10-20 08:59:37

解决方案3
2 2020-10-20 09:19:40

根据列中的最高索引和值过滤数据框中的行

问题描述

3 个解决方案

解决方案1 3 已采纳 2020-10-20 08:52:22

解决方案2 2 2020-10-20 08:59:37

解决方案3 2 2020-10-20 09:19:40

解决方案1
3 已采纳 2020-10-20 08:52:22

解决方案2
2 2020-10-20 08:59:37

解决方案3
2 2020-10-20 09:19:40