简体   繁体   中英

regular expression for log file

I have log file like this:

<CL>
text sample1
<CL>
<CL>
<TR></TR>
</CL>
<CL>
<CL>
<CL>
<TR1></TR1>
</CL>
<CL>
text sample2
<CL>
text sample3
<CL>
<TR1>
<TR2></TR2>
</TR1>
</CL>

I need to write regular expression which returns valid xml from this file. I need this result:

<CL>
<TR></TR>
</CL>

<CL>
<TR1></TR1>
</CL>

<CL>
<TR1>
<TR2></TR2>
</TR1>
</CL>

This variand doesn't work for me:

<CL>[\s\S]*?(<CL>[\s\S]+?</CL>)

Thanks in advance.

As i experienced, the regular expressions is not so good for XML file validation, parsing, and reading.

Better to use a DOM PARSER solution for this problem. Most of them have validation method. In php: http://php.net/manual/en/book.simplexml.php (a lot work with this)

Or PHP Simple HTML DOM Parser: http://simplehtmldom.sourceforge.net/ (just read the xml file and print the object created from xml, and get the valid xml struct, as i remember it works not only for HTML struct) In java: JSOUP library http://jsoup.org/ (nearly same as simpledom in php)

And first at all, a valid xml file should contains a root tag (Like HTML tag in html files , this wrapping the document )

I hope this helps you out

This regex will work for your example

"<CL>((?!<CL>).)*?(?:<TR[\\d]*?>)+.*?(?:</TR[\\d]*?>)+.*?</CL>"

Note, that dependend of programming language, you should set the Singleline regex option in order this regex to work

EDIT in some languages there is no need to escape \\d, so try also

"<CL>((?!<CL>).)*?(?:<TR[\d]*?>)+.*?(?:</TR[\d]*?>)+.*?</CL>"

EDIT2 If you just want to catch the cl tag content, you can simply use:

<CL>((?!<CL>).)*</CL>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM