简体   繁体   中英

Using Regular expressions in java to extract contents of xml tag

I have a string which is huge and a part of it contains as the following :

<df>asdffs</df><titletext xml:lang="eng" original="y">Dose intensity <inf>low</inf> in advanced cancer: Have we answered the question?</titletext><sdf>gfdgas</sdf>

I need to find if <inf> tag exists in the <titletext> tag. I am writing it in Java.

Thanks in advance.

I would strongly recommend using an XML parser ( SAX , since your document is supposedly large - it won't load all your document into memory at once but rather stream it through) and parsing it this way. You'll avoid all sort of edge cases which regular expression handlers can't handle (since XML isn't regular )

In your example above, you should likely maintain a stack of encountered XML elements, and track if <inf> is preceeded by <titletext>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM