split file contents based on regex: python

Question

I want to keep each rule (rule1,rule2,rule3) in a list . example file: https://github.com/Yara-Rules/rules/blob/master/malware/APT_WildNeutron.yar I am using the following code

patt=re.compile("\s*[\n]*rule.*[\n]*\s*.*{")

results=re.split(r'\s*[\n]*rule.*[\n]*\s*.*{.', buf)

results does not contain the list.but it looks like like split is not working. Can anybody help on this?

-----------file contents-------

rule rule1{

meta: 

 desc-test1


}

rule rule2{

meta: 

desc-test2


}

rule rule3{

meta: 

desc-test3


}

----file end---------- expected output

inside a rule there can be "rule strings". So a rule should be identified as rule ruleName{**content can be anything includes new line words any string }**. rule content would be limited by curly braces. I should be able to extract the rules into list. rules[0] should contain rule 1 and its contents. similarly for rule2.

Answer 1

 results=re.split(r'\\s*[\\n]*rule.*[\\n]*\\s*.*{.', buf)

Your pattern didn't match because the content-initiating { in your input is immediately followed by \\n , and . without re.DOTALL doesn't match \\n .
In \\s*[\\n]* , the [\\n]* is useless because \\s already matches \\n .
Since you want the rule name also returned, you needn't include it in the split pattern.

So,

results = re.split(r'\brule\s+', buf)[1:]

should do (the [1:] discards the part before the first rule ).

split file contents based on regex: python

Question

1 answers

solution1
0 2016-04-12 06:24:19

split file contents based on regex: python

Question

1 answers

solution1 0 2016-04-12 06:24:19

solution1
0 2016-04-12 06:24:19