regular expression for log file

Question

I have log file like this:

<CL>
text sample1
<CL>
<CL>
<TR></TR>
</CL>
<CL>
<CL>
<CL>
<TR1></TR1>
</CL>
<CL>
text sample2
<CL>
text sample3
<CL>
<TR1>
<TR2></TR2>
</TR1>
</CL>

I need to write regular expression which returns valid xml from this file. I need this result:

<CL>
<TR></TR>
</CL>

<CL>
<TR1></TR1>
</CL>

<CL>
<TR1>
<TR2></TR2>
</TR1>
</CL>

This variand doesn't work for me:

<CL>[\s\S]*?(<CL>[\s\S]+?</CL>)

Thanks in advance.

Answer 1

As i experienced, the regular expressions is not so good for XML file validation, parsing, and reading.

Better to use a DOM PARSER solution for this problem. Most of them have validation method. In php: http://php.net/manual/en/book.simplexml.php (a lot work with this)

Or PHP Simple HTML DOM Parser: http://simplehtmldom.sourceforge.net/ (just read the xml file and print the object created from xml, and get the valid xml struct, as i remember it works not only for HTML struct) In java: JSOUP library http://jsoup.org/ (nearly same as simpledom in php)

And first at all, a valid xml file should contains a root tag (Like HTML tag in html files , this wrapping the document )

I hope this helps you out

Answer 2

This regex will work for your example

"<CL>((?!<CL>).)*?(?:<TR[\\d]*?>)+.*?(?:</TR[\\d]*?>)+.*?</CL>"

Note, that dependend of programming language, you should set the Singleline regex option in order this regex to work

EDIT in some languages there is no need to escape \\d, so try also

"<CL>((?!<CL>).)*?(?:<TR[\d]*?>)+.*?(?:</TR[\d]*?>)+.*?</CL>"

EDIT2 If you just want to catch the cl tag content, you can simply use:

<CL>((?!<CL>).)*</CL>

regular expression for log file

Question

2 answers

solution1
2 2013-03-06 08:52:48

solution2
1 ACCPTED 2013-03-06 08:56:40

regular expression for log file

Question

2 answers

solution1 2 2013-03-06 08:52:48

solution2 1 ACCPTED 2013-03-06 08:56:40

solution1
2 2013-03-06 08:52:48

solution2
1 ACCPTED 2013-03-06 08:56:40