I have the following sample URL which I need to sanitize
http://image.s5a.com/is/image/saks/0447522591096_647x329.jpg" border="0" params="">
into
http://image.s5a.com/is/image/saks/0447522591096_647x329.jpg
My question is, which regex should I use to flexibly remove everything after the .extension, regardless of whether its .jpg, or .png or .jpeg?
Also the texts and symbols after the extension will all be different.
Thanks
You can use:
var s = 'http://image.s5a.com/is/image/saks/0447522591096_647x329.jpg" border="0" params="">';
var r = s.replace(/^(.+?\.(png|jpe?g)).*$/i, '$1');
//=> http://image.s5a.com/is/image/saks/0447522591096_647x329.jpg
An alternative to regex would be to just use basic string parsing.
var fullUrl = 'http://image.s5a.com/is/image/saks/0447522591096_647x329.jpg" border="0" params="">';
var baseUrl = fullUrl.split(' ')[0];
Edit: you may also want to decode the url so you don't get tripped up by % encoding.
var fullUrl = 'http://image.s5a.com/is/image/saks/0447522591096_647x329.jpg" border="0" params="">';
var fullUrl = decodeURI(fullUrl);
var baseUrl = fullUrl.split(' ')[0];
You shouldn't try to do things like this yourself. All languages have libraries for reading HTML, and they're more reliable than doing it yourself. If this is client-side javascript, you could use jQuery; then, if element
is a jquery object representing the HTML element, element.attr('src')
will be the value of the src
attribute.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.