简体   繁体   中英

split file contents based on regex: python

I want to keep each rule (rule1,rule2,rule3) in a list . example file: https://github.com/Yara-Rules/rules/blob/master/malware/APT_WildNeutron.yar I am using the following code

patt=re.compile("\s*[\n]*rule.*[\n]*\s*.*{")

results=re.split(r'\s*[\n]*rule.*[\n]*\s*.*{.', buf) 

results does not contain the list.but it looks like like split is not working. Can anybody help on this?

-----------file contents-------

rule rule1{

meta: 

 desc-test1


}

rule rule2{

meta: 

desc-test2


}

rule rule3{

meta: 

desc-test3


}

----file end---------- expected output

inside a rule there can be "rule strings". So a rule should be identified as rule ruleName{**content can be anything includes new line words any string }**. rule content would be limited by curly braces. I should be able to extract the rules into list. rules[0] should contain rule 1 and its contents. similarly for rule2.

 results=re.split(r'\\s*[\\n]*rule.*[\\n]*\\s*.*{.', buf) 
  1. Your pattern didn't match because the content-initiating { in your input is immediately followed by \\n , and . without re.DOTALL doesn't match \\n .
  2. In \\s*[\\n]* , the [\\n]* is useless because \\s already matches \\n .
  3. Since you want the rule name also returned, you needn't include it in the split pattern.

So,

results = re.split(r'\brule\s+', buf)[1:]

should do (the [1:] discards the part before the first rule ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM