简体   繁体   English

如何使用java中的正则表达式捕获字符串中的内容

[英]how to capture content in a string using a regular expression in java

I would like to parse an HTML form and pull our filename's of any embedded images. 我想解析HTML表单并提取任何嵌入图像的文件名。

So the string could look like: 所以字符串看起来像:

{ 

... random HTML content ...随机的HTML内容

    image1.png 

 more random HTML content

    image3.png

... } ...}

From the above I would like to write a function in Java that returns to me {image1.png, image3.png}. 从上面我想用Java编写一个返回给我的函数{image1.png,image3.png}。

I have a regular expression that returns to me only the last image name (image3.png) but it disregards previous image names. 我有一个正则表达式,只返回最后一个图像名称(image3.png),但它忽略了以前的图像名称。 How can I capture all of them using regex? 如何使用正则表达式捕获所有这些?

All / any help would be appreciated. 所有/任何帮助将不胜感激。

https://stackoverflow.com/a/2059614/684934 give a good hint. https://stackoverflow.com/a/2059614/684934给出了一个很好的提示。 More specifically, you're probably looking for something like [a-zA-Z0-9_\\-]+\\.(png|jpg|gif|jpeg|tif) 更具体地说,你可能正在寻找类似[a-zA-Z0-9_\\-]+\\.(png|jpg|gif|jpeg|tif)

Note, however, that this is regex and is only looking for sequences of characters. 但请注意,这是正则表达式,仅查找字符序列。 If you are looking at a site that serves up dynamic images using servlets for example, and the resource URI doesn't happen to end with a normal image file extension (such as .jsp or .do), then the regex will completely fail. 例如,如果您正在查看使用servlet提供动态映像的站点,并且资源URI不会以正常的映像文件扩展名(例如.jsp或.do)结束,那么正则表达式将完全失败。 It will also pick up any "image names" from any sort of text that happens to match, which does not actually represent an image on the page. 它还会从碰巧匹配的任何类型的文本中选取任何“图像名称”,这实际上并不代表页面上的图像。

To do the job properly, you will need to use some sort of DOM and traverse the <img> elements. 要正确完成工作,您需要使用某种DOM并遍历<img>元素。 (And the <button> elements, which may be of type image ... there are probably more tags that can have images.) (还有<button>元素,可能是image类型......可能有更多可以有图像的标签。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM