简体   繁体   English

从文件中的行读取多个子字符串

[英]Reading multiple substrings from lines in a file

So essentially what I am doing is making a report from an apache error_log file using python scripting. 因此,基本上我正在做的是使用python脚本从apache error_log文件中生成报告。 An example of what I am dealing with is: 我正在处理的一个示例是:

[Wed Apr 13 18:33:42.521106 2016] [core:notice] [pid 11690] SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0
[Wed Apr 13 18:33:42.543989 2016] [suexec:notice] [pid 11690] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)

The end result which I'm trying to get would look something like: 我试图获得的最终结果将类似于:

core:notice - SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0
suexec:notice - AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)

which is the error type followed by the trailing text. 错误类型,后跟尾随文本。 I then need to write this formatted text into a new file. 然后,我需要将此格式文本写入新文件。

I've been trying to use Regular Expressions to do this but it's been years since I used python at all, and have never used regular expressions before. 我一直在尝试使用正则表达式来执行此操作,但是距我完全使用python已经有好几年了,并且之前从未使用过正则表达式。 The most I've been able to get thus far is isolating the first (date) section, but I cannot figure out how to get subsequent bracket surrounded substrings and the trailing text. 到目前为止,我能得到的最多是隔离第一个(日期)部分,但是我无法弄清楚如何获得后续括号括起来的子字符串和尾随文本。 Any and all help would be greatly appreciated! 任何和所有帮助将不胜感激!

Since your data consist of exactly four fields and are shown with nice squared bracket per field except the last one, you could take advantages from those behaviors to do your task without using Regex like this: 由于您的数据正好由四个字段组成, 并且除最后一个字段外,每个字段都用漂亮的方括号显示,因此您可以利用这些行为来执行任务,而无需使用Regex这样:

texts = ['[Wed Apr 13 18:33:42.521106 2016] [core:notice] [pid 11690] SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0', \
'[Wed Apr 13 18:33:42.543989 2016] [suexec:notice] [pid 11690] AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)']
for text in texts:
    words = text.replace('[','').split(']')
    newWords = words[1] + ' -' + words[3]
    print(newWords)

Resulting in: 导致:

 core:notice - SELinux policy enabled; httpd running as context system_u:system_r:httpd_t:s0
 suexec:notice - AH01232: suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)

The idea is to first replace one the starting squared bracket with empty string and use the closing squared bracket as the parameter to split your word (thus will be removed too): 这个想法是先用空字符串替换一个开始的方括号,然后使用封闭的方括号作为参数分割您的单词(因此也将被删除):

words = text.replace('[','').split(']')

Then you simply need to combine the fields which you want to form your new string from: 然后,您只需要组合要从中形成新string的字段:

newWords = words[1] + ' -' + words[3]

And you are done. 您完成了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM