[英]Deleting rows with certain keywords of csv file
I have a large data file and I need to delete rows that have certain keywords. 我有一个大数据文件,需要删除具有某些关键字的行。
Here is an example of the file I'm using: 这是我正在使用的文件的示例:
User Name DN
MB31212 CN=MB31212,CN=Users,DC=prod,DC=trovp,DC=net
MB23423 CN=MB23423 ,OU=Generic Mailbox,DC=prod,DC=trovp,DC=net
MB23424 CN=MB23424 ,CN=Users,DC=prod,DC=trovp,DC=net
MB23423 CN=MB23423,OU=DNA,DC=prod,DC=trovp,DC=net
MB23234 CN=MB23234 ,OU=DNA,DC=prod,DC=trovp,DC=net
This is how I import file: 这是我导入文件的方式:
import pandas as pd
df = pd.read_csv('sample.csv', sep=',', encoding='latin1')
How can I 我怎样才能
I would like to get something like what is posted below, with the 2 rows that contained 'OU=DNA' deleted and the 'CN=x' deleted from every row: 我想得到类似于下面发布的内容,其中删除了包含“ OU = DNA”的2行,并从每一行中删除了“ CN = x”:
User Name DN
MB31212 CN=Users,DC=prod,DC=trovp,DC=net
MB23423 OU=Generic Mailbox,DC=prod,DC=trovp,DC=net
MB23424 CN=Users,DC=prod,DC=trovp,DC=net
You can try this two-step filtering as your logic. 您可以尝试将此两步过滤作为逻辑。 Use the
str.contains
method to filter out rows with OU=DNA
and use str.replace
method with regular expression to trim the leading CN=x
: 使用
str.contains
方法筛选出具有行OU=DNA
和使用str.replace
方法与正则表达式来修整领先CN=x
:
newDf = df.loc[~df.DN.str.contains("OU=DNA")]
newDf.DN = newDf.DN.str.replace("^CN=[^,]*,", "")
newDf
UserName DN
0 MB31212 CN=Users,DC=prod,DC=trovp,DC=net
1 MB23423 OU=Generic Mailbox,DC=prod,DC=trovp,DC=net
2 MB23424 CN=Users,DC=prod,DC=trovp,DC=net
A little break down of the regular expression: ^
stands for the beginning of the string which is followed by CN=
and use [^,]*,
to capture pattern until the first comma; 正则表达式略有不同:
^
表示字符串的开头,其后是CN=
并使用[^,]*,
捕获模式,直到第一个逗号为止;
To read the file sample you gave I used: 要读取您使用的文件样本,我使用了:
df = pd.read_csv('sample.csv', sep=' ', encoding='latin1', engine="python")
and then: 接着:
df = df.drop(df[df.DN.str.contains("OU=DNA")].index)
df.DN = df.DN.str.replace('(CN=MB[0-9]{5}\s*,)', '')
df
gave the desired result: 得到了预期的结果:
User Name DN
0 MB31212 CN=Users,DC=prod,DC=trovp,DC=net
1 MB23423 OU=Generic Mailbox,DC=prod,DC=trovp,DC=net
2 MB23424 CN=Users,DC=prod,DC=trovp,DC=net
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.