简体   繁体   中英

How to use the regular expression from the following string to get the url

给定以下字符串,我可以使用什么正则表达式仅提取URL(不需要引号)?

<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281438586869.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439101401.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439283119.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281439479213.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440090151.jpg\" /> \r\n</p>\r\n<p>\r\n\t<img alt=\"\" src=\"/upload/201704/28/201704281440244369.jpg\" /> \r\n</p>

What you're looking for is /(\\/.*?\\.\\w{3})/g :

 var string = '<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281438586869.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281439101401.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281439283119.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281439479213.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281440090151.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281440244369.jpg\\" /> \\r\\n</p>'; console.log(string.match(/(\\/.*?\\.\\w{3})/g)); 

Breaking this down:

  • \\/ matches a forward slash, escaping it with a backslash
  • .* matches 0 or more characters that aren't line breaks
  • \\. matches a dot, escaping it with a backslash
  • \\w{3} matches exactly three 'word' characters (alphanumeric or underscore)
  • The g flag indicates that the regex should match all occurrences

.match returns an array, and you can extract the individual strings (without the quotation marks) by simply specifying the index, or iterating through a loop:

 var string = '<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281438586869.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281439101401.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281439283119.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281439479213.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281440090151.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281440244369.jpg\\" /> \\r\\n</p>'; var matches = string.match(/(\\/.*?\\.\\w{3})/g); for (var i = 0; i < matches.length; i++) { console.log(matches[i]); } 

Hope this helps! :)

It's safer to create a DocumentFragment with the HTML, and then query the temporary DOM for the information. This is safer because regex can be very brittle with DOM. For example, what happens if the URLs you have in the HTML may or may not have a protocol such as https, ftp, etc..

I am using a small library to convert the HTML to a DocumentFragemnt. But you can do this in many ways.

 let html = `<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281438586869.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281439101401.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281439283119.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281439479213.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281440090151.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281440244369.jpg\\" /> \\r\\n</p>`; let fragment = HtmlFragment(html); let urls = Array .from(fragment.querySelectorAll('img[src]')) .map(img => img.getAttribute('src')); console.log(urls); 
 <script src="https://unpkg.com/html-fragment@1.1.0/lib/html-fragment.min.js"></script> 

 var string = '<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281438586869.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281439101401.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281439283119.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281439479213.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281440090151.jpg\\" /> \\r\\n</p>\\r\\n<p>\\r\\n\\t<img alt=\\"\\" src=\\"/upload/201704/28/201704281440244369.jpg\\" /> \\r\\n</p>'; console.log(string.match(/(\\/.*?\\.\\w{3})/g)); 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM