简体   繁体   English

根据列中的最高索引和值过滤数据框中的行

[英]Filter rows from a data frame based on the highest index and values from column

I have the following example: I would like to keep all the rows where ID=5 and where I have multiple rows with ID=3 I would like to keep only from them the ones with the highest index.我有以下示例:我想保留ID=5所有行以及ID=3多行我只想保留索引最高的行。

data = {'Profession':['Teacher', 'Banker', 'Teacher', 'Judge','lawyer','Teacher'], 'Gender':['Male','Male', 'Female', 'Male','Male','Female'],'Size':['M','M','L','S','S','M'],'ID':['5','3','3','3','5','3']} 
data2={'Profession':['Doctor', 'Scientist', 'Scientist', 'Banker','Judge','Scientist'], 'Gender':['Male','Male', 'Female','Female','Male','Male'],'Size':['L','M','L','M','L','L'],'ID':['5','3','5','3','3','3']} 
data3 = {'Profession':['Banker', 'Banker', 'Doctor', 'Doctor','lawyer','Teacher'], 'Gender':['Male','Male', 'Female', 'Female','Female','Male'],'Size':['S','M','S','M','L','S'],'ID':['5','3','3','3','5','3']} 
data4={'Profession':['Judge', 'Judge', 'Scientist', 'Banker','Judge','Scientist'], 'Gender':['Female','Female', 'Female','Female','Female','Female'],'Size':['M','S','L','S','M','S'],'ID':['3','5','3','3','5','3']}  
df =pd.DataFrame(data) 
df2=pd.DataFrame(data2)
df3=pd.DataFrame(data3)
df4=pd.DataFrame(data4)
DATA=pd.concat([df,df2,df3,df4])
DATA.reset_index(drop=True,inplace=True)
DATA

在此处输入图片说明

I want this : This is just an example.我想要这个:这只是一个例子。 In my real data I have really huge number of rows so I would like to have a piece of code which works for larger data frames.在我的真实数据中,我有非常多的行,所以我想要一段适用于更大数据帧的代码。

在此处输入图片说明

You can construct a boolean which gets the following IDs with 3 but leaves the first.您可以构造一个布尔值,它使用3获取以下 ID,但保留第一个。

The bool is testing that布尔正在测试

  1. the row is equal to 3该行等于3
  2. the row above these true values is also equal to 3这些真值上方的行也等于3

if we look at the first few rows with a conditional column with this boolean -如果我们查看带有此布尔值的条件列的前几行 -

  Profession  Gender Size  ID  bool_
0     Teacher    Male    M   5  False
1      Banker    Male    M   3  False <-- fulfills 1st condition but not 2nd so false.
2     Teacher  Female    L   3   True <-- fulfills condition 1 & 2
3       Judge    Male    S   3   True <-- fulfills condition 1 & 2
4      lawyer    Male    S   5  False
5     Teacher  Female    M   3  False

#df = DATA
#df['ID'] = df['ID'].astype(int)

m = df['ID'].eq(3) & df['ID'].eq(df['ID'].shift())

df_new = df[~m]

   Profession  Gender Size   ID
0     Teacher    Male    M  5.0
1      Banker    Male    M  3.0
4      lawyer    Male    S  5.0
5     Teacher  Female    M  3.0
6      Doctor    Male    L  5.0
7   Scientist    Male    M  3.0
8   Scientist  Female    L  5.0
9      Banker  Female    M  3.0
12     Banker    Male    S  5.0
13     Banker    Male    M  3.0
16     lawyer  Female    L  5.0
17    Teacher    Male    S  3.0
19      Judge  Female    S  5.0
20  Scientist  Female    L  3.0
22      Judge  Female    M  5.0
23  Scientist  Female    S  3.0

Use:用:

data_filtered = DATA.loc[~(DATA['ID'].ne(DATA['ID'].shift()).cumsum().duplicated() & 
                           DATA['ID'].eq('3')), :]
print(data_filtered)

   Profession  Gender Size ID
0     Teacher    Male    M  5
1      Banker    Male    M  3
4      lawyer    Male    S  5
5     Teacher  Female    M  3
6      Doctor    Male    L  5
7   Scientist    Male    M  3
8   Scientist  Female    L  5
9      Banker  Female    M  3
12     Banker    Male    S  5
13     Banker    Male    M  3
16     lawyer  Female    L  5
17    Teacher    Male    S  3
19      Judge  Female    S  5
20  Scientist  Female    L  3
22      Judge  Female    M  5
23  Scientist  Female    S  3

You could use ~m of @Manakin answer:您可以使用@Manakin 的~m答案:

DATA.loc[~m, :]
 #Double boolean, filter

 DATA[DATA.ID.eq('3')&DATA.ID.shift().eq('5')|DATA.ID.eq('5')]


 Profession  Gender Size ID
0     Teacher    Male    M  5
1      Banker    Male    M  3
4      lawyer    Male    S  5
5     Teacher  Female    M  3
6      Doctor    Male    L  5
7   Scientist    Male    M  3
8   Scientist  Female    L  5
9      Banker  Female    M  3
12     Banker    Male    S  5
13     Banker    Male    M  3
16     lawyer  Female    L  5
17    Teacher    Male    S  3
19      Judge  Female    S  5
20  Scientist  Female    L  3
22      Judge  Female    M  5
23  Scientist  Female    S  3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM