I have input string which has strings like:
<image id="1234" caption="text1" alt="text2">
...blah blah...
There can be multiple instances of such strings in the input.
I want to retrieve the attributes(caption, alt, etc) of such string alongwith the id and then print the id, alt, caption etc. There can be images without any attributes and just id.
Please advise.
First things first: Don't parse xml or [x]html with regex , this is generally considered not to be a good approach.
But I understand that for quick+dirty applications, you don't want to deal with 3rd party libraries. But you have to consider the following questions, which make regex an even worse approach:
caption
sometimes occur before alt
at any chance? image
tags only contain the id tag These (and more) aspects determine the complexity of your regex solution. You need a double loop in order to get all the required data.
(<image[^>]+)>
(this assumes there are no >
characters in the attribute values) image
tags you found, use this: [ ]+([a-zA-Z0-9]+)="([^"]*)"
I hope you already see that this is quite messy and does not cover all the cases of valid xml !
A xml parser is always the correct way to go.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.