简体   繁体   中英

Removing rows from a Pandas DataFrame based on a column value

I have a dataframe with three columns: "id", "access" and "folder path". I want to remove all folders that are subpaths of the root folder and have identical access and id. For example, in the below dataframe, the third row needs to be deleted because abc already has read access to the root folder:

id       access      folder path
abc      read        C:/new
abc      write       C:/new
abc      read        C:/new/folder
abc      read        C:/new1
def      read        C:/new
  1. Split the folder path column at /

    df[['path1','path2','path3']] = df['folder path'].str.split("/",expand=True,n=2)

It will produce a dataframe that looks like this,

id   access folder path      path1   path2  path3
abc  read    C:/new           C:      new    None
abc  write   C:/new           C:      new    None
abc  read    C:/new/folder    C:      new   folder
abc  read    C:/new1          C:      new1   None
def  read    C:/new           C:      new    None
  1. Now remove all duplicates from the columns id , access , path2 where path3 value is None . For that purpose create two masks. One to identify all None , another to find duplicates among required columns.

     m1 = df['path3'].isna() m2 = ~df[['id','access','path2']].duplicated() df_new = df[m1 & m2]

Final DataFrame will be

id   access folder path      
abc  read    C:/new        
abc  write   C:/new           
abc  read    C:/new1          
def  read    C:/new 

      
  1. Remove path1 , path2 , path3 columns

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM