删除带有csv文件某些关键字的行

Question

I have a large data file and I need to delete rows that have certain keywords. 我有一个大数据文件，需要删除具有某些关键字的行。

Here is an example of the file I'm using: 这是我正在使用的文件的示例：

User Name     DN
MB31212       CN=MB31212,CN=Users,DC=prod,DC=trovp,DC=net
MB23423       CN=MB23423 ,OU=Generic Mailbox,DC=prod,DC=trovp,DC=net
MB23424       CN=MB23424 ,CN=Users,DC=prod,DC=trovp,DC=net
MB23423       CN=MB23423,OU=DNA,DC=prod,DC=trovp,DC=net
MB23234       CN=MB23234 ,OU=DNA,DC=prod,DC=trovp,DC=net

This is how I import file: 这是我导入文件的方式：

import pandas as pd
df = pd.read_csv('sample.csv', sep=',', encoding='latin1')

How can I 我怎样才能

Delete all rows that contain 'OU=DNA' in DN column for example? 例如，删除DN列中所有包含“ OU = DNA”的行？
How can I delete the first attribute 'CN= x' in the DN column without deleting the rest of the data in the column? 如何删除DN列中的第一个属性“ CN = x”，而不删除该列中的其余数据？

I would like to get something like what is posted below, with the 2 rows that contained 'OU=DNA' deleted and the 'CN=x' deleted from every row: 我想得到类似于下面发布的内容，其中删除了包含“ OU = DNA”的2行，并从每一行中删除了“ CN = x”：

User Name     DN
MB31212       CN=Users,DC=prod,DC=trovp,DC=net
MB23423       OU=Generic Mailbox,DC=prod,DC=trovp,DC=net
MB23424       CN=Users,DC=prod,DC=trovp,DC=net

Answer 1

You can try this two-step filtering as your logic. 您可以尝试将此两步过滤作为逻辑。 Use the str.contains method to filter out rows with OU=DNA and use str.replace method with regular expression to trim the leading CN=x : 使用str.contains方法筛选出具有行OU=DNA和使用str.replace方法与正则表达式来修整领先CN=x ：

newDf = df.loc[~df.DN.str.contains("OU=DNA")]
newDf.DN = newDf.DN.str.replace("^CN=[^,]*,", "")
newDf

    UserName    DN
0   MB31212 CN=Users,DC=prod,DC=trovp,DC=net
1   MB23423 OU=Generic Mailbox,DC=prod,DC=trovp,DC=net
2   MB23424 CN=Users,DC=prod,DC=trovp,DC=net

A little break down of the regular expression: ^ stands for the beginning of the string which is followed by CN= and use [^,]*, to capture pattern until the first comma; 正则表达式略有不同： ^表示字符串的开头，其后是CN=并使用[^,]*,捕获模式，直到第一个逗号为止；

Answer 2

To read the file sample you gave I used: 要读取您使用的文件样本，我使用了：

df = pd.read_csv('sample.csv', sep='     ', encoding='latin1', engine="python")

and then: 接着：

df = df.drop(df[df.DN.str.contains("OU=DNA")].index)
df.DN = df.DN.str.replace('(CN=MB[0-9]{5}\s*,)', '')
df

gave the desired result: 得到了预期的结果：

    User Name   DN
0   MB31212     CN=Users,DC=prod,DC=trovp,DC=net
1   MB23423     OU=Generic Mailbox,DC=prod,DC=trovp,DC=net
2   MB23424     CN=Users,DC=prod,DC=trovp,DC=net

删除带有csv文件某些关键字的行

问题描述

2 个解决方案

解决方案1
3 已采纳 2016-06-22 21:40:33

解决方案2
1 2016-06-22 22:10:05

删除带有csv文件某些关键字的行

问题描述

2 个解决方案

解决方案1 3 已采纳 2016-06-22 21:40:33

解决方案2 1 2016-06-22 22:10:05

解决方案1
3 已采纳 2016-06-22 21:40:33

解决方案2
1 2016-06-22 22:10:05