[英]Removing rows from a Pandas DataFrame based on a column value
I have a dataframe with three columns: "id", "access" and "folder path".我有一个 dataframe 有三列:“id”、“访问”和“文件夹路径”。 I want to remove all folders that are subpaths of the root folder and have identical access and id.我想删除作为根文件夹的子路径并具有相同访问权限和 ID 的所有文件夹。 For example, in the below dataframe, the third row needs to be deleted because abc already has read access to the root folder:例如,在下面的 dataframe 中,需要删除第三行,因为 abc 已经具有对根文件夹的读取权限:
id access folder path
abc read C:/new
abc write C:/new
abc read C:/new/folder
abc read C:/new1
def read C:/new
Split the folder path
column at /
在/
处拆分folder path
列
df[['path1','path2','path3']] = df['folder path'].str.split("/",expand=True,n=2)
It will produce a dataframe that looks like this,它将产生一个看起来像这样的 dataframe,
id access folder path path1 path2 path3
abc read C:/new C: new None
abc write C:/new C: new None
abc read C:/new/folder C: new folder
abc read C:/new1 C: new1 None
def read C:/new C: new None
Now remove all duplicates from the columns id
, access
, path2
where path3
value is None
.现在从列id
、 access
、 path2
中删除所有重复项,其中path3
值为None
。 For that purpose create two masks.为此目的创建两个蒙版。 One to identify all None
, another to find duplicates among required columns.一个用于识别所有None
,另一个用于在所需列中查找重复项。
m1 = df['path3'].isna() m2 = ~df[['id','access','path2']].duplicated() df_new = df[m1 & m2]
Final DataFrame will be最终 DataFrame 将是
id access folder path
abc read C:/new
abc write C:/new
abc read C:/new1
def read C:/new
path1
, path2
, path3
columns删除path1
、 path2
、 path3
列
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.