Removing rows from a Pandas DataFrame based on a column value

Question

I have a dataframe with three columns: "id", "access" and "folder path". I want to remove all folders that are subpaths of the root folder and have identical access and id. For example, in the below dataframe, the third row needs to be deleted because abc already has read access to the root folder:

id       access      folder path
abc      read        C:/new
abc      write       C:/new
abc      read        C:/new/folder
abc      read        C:/new1
def      read        C:/new

Answer 1

Split the folder path column at /

df[['path1','path2','path3']] = df['folder path'].str.split("/",expand=True,n=2)

It will produce a dataframe that looks like this,

id   access folder path      path1   path2  path3
abc  read    C:/new           C:      new    None
abc  write   C:/new           C:      new    None
abc  read    C:/new/folder    C:      new   folder
abc  read    C:/new1          C:      new1   None
def  read    C:/new           C:      new    None

Now remove all duplicates from the columns id , access , path2 where path3 value is None . For that purpose create two masks. One to identify all None , another to find duplicates among required columns.
```
 m1 = df['path3'].isna() m2 = ~df[['id','access','path2']].duplicated() df_new = df[m1 & m2]
```

Final DataFrame will be

id   access folder path      
abc  read    C:/new        
abc  write   C:/new           
abc  read    C:/new1          
def  read    C:/new

Remove path1 , path2 , path3 columns

Removing rows from a Pandas DataFrame based on a column value

Question

1 answers

solution1
0 2021-05-25 19:12:59

Removing rows from a Pandas DataFrame based on a column value

Question

1 answers

solution1 0 2021-05-25 19:12:59

solution1
0 2021-05-25 19:12:59