简体   繁体   English

根据列值从 Pandas DataFrame 中删除行

[英]Removing rows from a Pandas DataFrame based on a column value

I have a dataframe with three columns: "id", "access" and "folder path".我有一个 dataframe 有三列:“id”、“访问”和“文件夹路径”。 I want to remove all folders that are subpaths of the root folder and have identical access and id.我想删除作为根文件夹的子路径并具有相同访问权限和 ID 的所有文件夹。 For example, in the below dataframe, the third row needs to be deleted because abc already has read access to the root folder:例如,在下面的 dataframe 中,需要删除第三行,因为 abc 已经具有对根文件夹的读取权限:

id       access      folder path
abc      read        C:/new
abc      write       C:/new
abc      read        C:/new/folder
abc      read        C:/new1
def      read        C:/new
  1. Split the folder path column at //处拆分folder path

    df[['path1','path2','path3']] = df['folder path'].str.split("/",expand=True,n=2)

It will produce a dataframe that looks like this,它将产生一个看起来像这样的 dataframe,

id   access folder path      path1   path2  path3
abc  read    C:/new           C:      new    None
abc  write   C:/new           C:      new    None
abc  read    C:/new/folder    C:      new   folder
abc  read    C:/new1          C:      new1   None
def  read    C:/new           C:      new    None
  1. Now remove all duplicates from the columns id , access , path2 where path3 value is None .现在从列idaccesspath2中删除所有重复项,其中path3值为None For that purpose create two masks.为此目的创建两个蒙版。 One to identify all None , another to find duplicates among required columns.一个用于识别所有None ,另一个用于在所需列中查找重复项。

     m1 = df['path3'].isna() m2 = ~df[['id','access','path2']].duplicated() df_new = df[m1 & m2]

Final DataFrame will be最终 DataFrame 将是

id   access folder path      
abc  read    C:/new        
abc  write   C:/new           
abc  read    C:/new1          
def  read    C:/new 

      
  1. Remove path1 , path2 , path3 columns删除path1path2path3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM