简体   繁体   English

使用python解析文件并提取块之间的行

[英]Parse file and extract lines between blocks with python

I have a file that I need to parse and extract some specific lines from. 我有一个文件需要解析并从中提取一些特定的行。 This is an example of the file data: 这是文件数据的示例:

dn: uid=portaladmin,ou=people,ou=myrealm,dc=portalDomain
objectclass: wlsUser objectclass: top 
objectclass: person 
objectclass: organizationalPerson 
objectclass: inetOrgPerson 
cn: portaladmin 
sn: portaladmin 
description: Admin for portal domain 
uid: portaladmin userpassword:: e3NzaGF9L3JYUldtVERBUklCdWM3NGtBSlJQVFVjQ04yRmNkU3o= 
wlsMemberOf: cn=PortalSystemAdministrators,ou=groups,ou=myrealm,dc=portalDom  ain

dn: uid=weblogic,ou=people,ou=myrealm,dc=portalDomain 
objectclass: wlsUser 
objectclass: top 
objectclass: person 
objectclass: organizationalPerson 
objectclass: inetOrgPerson 
cn: weblogic 
sn: weblogic 
description: This user is the default administrator. 
uid: weblogic 
userpassword:: e3NzaGF9VHhObDZhTlBpZTFSa2VVeTRTak1vWm0yTFJmdlN4RE8= 
wlsMemberOf: cn=Administrators,ou=groups,ou=myrealm,dc=portalDomain 
wlsMemberOf: cn=PortalSystemAdministrators,ou=groups,ou=myrealm,dc=portalDomain

As you can see the information is in blocks and I need to extract lines with ( cn: , sn: , description: , uid: and userpassword: ) values, also need to tell the script to search for specifics uid or cn from a list. 如您所见,信息以块为单位,我需要提取具有( cn:sn:description:uid:userpassword:值的行,还需要告诉脚本从列表中搜索特定的uidcn

I'm not a experienced programmer and that's why I came here to ask the gurus on this. 我不是一个经验丰富的程序员,这就是为什么我来这问问大师。 Please help, thanks in advance. 请帮助,在此先感谢。

Just find the lines using str.startswith,passing a tuple of the substrings: 只需使用str.startswith找到这些行,并传递一个子字符串元组:

with open("in.txt") as f:
    for line in f:
        if line.startswith(("cn:","sn:", "description:", "uid:","userpassword:")):
            print(line.rstrip())

Output: 输出:

cn: portaladmin
sn: portaladmin
description: Admin for portal domain
uid: portaladmin userpassword:: e3NzaGF9L3JYUldtVERBUklCdWM3NGtBSlJQVFVjQ04yRmNkU3o=
cn: weblogic
sn: weblogic
description: This user is the default administrator.
uid: weblogic
userpassword:: e3NzaGF9VHhObDZhTlBpZTFSa2VVeTRTak1vWm0yTFJmdlN4RE8=

Based on your comment if you want to search for substrings you can use any : 根据您的评论,如果您想搜索子字符串,可以使用以下any

  if any(sub in line for sub in ("cn: somestring", "sn: somestring", "description: somestring", "uid: somestring", "userpassword: somestring")):

If the pattern is more complicated then you will probably need a regex but without knowing exactly what you want to extract then it is not possible to suggest a viable regex 如果模式更复杂,那么您可能需要一个正则表达式,但是不完全知道要提取的内容,那么就不可能建议可行的正则表达式

extractedLines = []
with open("file.txt", "r") as f:
    for line in f:
        for item in ["cn:", "sn:", "description:", "uid:", "userpassword:"]:
            if item in line:
                extractedLines.append(line)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM