根据列值从 Pandas DataFrame 中删除行

Question

I have a dataframe with three columns: "id", "access" and "folder path".我有一个 dataframe 有三列：“id”、“访问”和“文件夹路径”。 I want to remove all folders that are subpaths of the root folder and have identical access and id.我想删除作为根文件夹的子路径并具有相同访问权限和 ID 的所有文件夹。 For example, in the below dataframe, the third row needs to be deleted because abc already has read access to the root folder:例如，在下面的 dataframe 中，需要删除第三行，因为 abc 已经具有对根文件夹的读取权限：

id       access      folder path
abc      read        C:/new
abc      write       C:/new
abc      read        C:/new/folder
abc      read        C:/new1
def      read        C:/new

Answer 1

Split the folder path column at /在/处拆分folder path列

df[['path1','path2','path3']] = df['folder path'].str.split("/",expand=True,n=2)

It will produce a dataframe that looks like this,它将产生一个看起来像这样的 dataframe，

id   access folder path      path1   path2  path3
abc  read    C:/new           C:      new    None
abc  write   C:/new           C:      new    None
abc  read    C:/new/folder    C:      new   folder
abc  read    C:/new1          C:      new1   None
def  read    C:/new           C:      new    None

Now remove all duplicates from the columns id , access , path2 where path3 value is None .现在从列id 、 access 、 path2中删除所有重复项，其中path3值为None 。 For that purpose create two masks.为此目的创建两个蒙版。 One to identify all None , another to find duplicates among required columns.一个用于识别所有None ，另一个用于在所需列中查找重复项。
```
 m1 = df['path3'].isna() m2 = ~df[['id','access','path2']].duplicated() df_new = df[m1 & m2]
```

Final DataFrame will be最终 DataFrame 将是

id   access folder path      
abc  read    C:/new        
abc  write   C:/new           
abc  read    C:/new1          
def  read    C:/new

Remove path1 , path2 , path3 columns删除path1 、 path2 、 path3列

根据列值从 Pandas DataFrame 中删除行

问题描述

1 个解决方案

解决方案1
0 2021-05-25 19:12:59

根据列值从 Pandas DataFrame 中删除行

问题描述

1 个解决方案

解决方案1 0 2021-05-25 19:12:59

解决方案1
0 2021-05-25 19:12:59