简体   繁体   中英

how to capture content in a string using a regular expression in java

I would like to parse an HTML form and pull our filename's of any embedded images.

So the string could look like:

{ 

... random HTML content

    image1.png 

 more random HTML content

    image3.png

... }

From the above I would like to write a function in Java that returns to me {image1.png, image3.png}.

I have a regular expression that returns to me only the last image name (image3.png) but it disregards previous image names. How can I capture all of them using regex?

All / any help would be appreciated.

https://stackoverflow.com/a/2059614/684934 give a good hint. More specifically, you're probably looking for something like [a-zA-Z0-9_\\-]+\\.(png|jpg|gif|jpeg|tif)

Note, however, that this is regex and is only looking for sequences of characters. If you are looking at a site that serves up dynamic images using servlets for example, and the resource URI doesn't happen to end with a normal image file extension (such as .jsp or .do), then the regex will completely fail. It will also pick up any "image names" from any sort of text that happens to match, which does not actually represent an image on the page.

To do the job properly, you will need to use some sort of DOM and traverse the <img> elements. (And the <button> elements, which may be of type image ... there are probably more tags that can have images.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM