简体   繁体   中英

Javascript regex parse complex url string

I need to parse a complex URL string to fetch specific values.

From the following URL string:

/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss

I need to extract this result in array format:

['http://any-feed-url-a.com?filter=hot&format=rss', 'http://any-feed-url-b.com?filter=rising&format=rss']

I tried already with this one /url=([^&]+)/ but I can't capture all correctly all the query parameters. And I would like to omit the url= .

RegExr link

Thanks in advance.

have you tried to use split method ? instead of using regex.

const urlsArr = "/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss".split("url=");
    urlsArr.shift(); // removing first item from array -> "/api/rss/feeds?"
console.log(urlsArr)

)

which is going to return ["/api/rss/feeds?", "http://any-feed-url-a.com?filter=hot&format=rss&", "http://any-feed-url-b.com?filter=rising&format=rss"] then i am dropping first item in array

if possible its better to use something else then regex CoddingHorror: regular-expressions-now-you-have-two-problems

This regex works for me: url=([az:/.?=-]+&[az=]+)

also, you can test this: /http(s)?://([az-.?=&])+&/g

例子

 const string = '/api/rss/feeds?url=http://any-feed-url.com?filter=hot&format=rss&url=http://any-feed-url.com?filter=latest&format=rss' const string2 = '/api/rss/feeds?url=http://any-feed-url.com?filter=hot&format=rss&next=parm&url=http://any-feed-url.com?filter=latest&format=rss' const regex = /url=([az:/.?=-]+&[az=]+)/g; const regex2 = /http(s)?:\\/\\/([az-.?=&])+&/g; console.log(string.match(regex)) console.log(string2.match(regex2))

You can matchAll the url's, then map the capture group 1 to an array.

 str = '/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss' arr = [...str.matchAll(/url=(.*?)(?=&url=|$)/g)].map(x => x[1]) console.log(arr)

But matchAll isn't supported by older browsers.
But looping an exec to fill an array works also.

 str = '/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot&format=rss&url=http://any-feed-url-b.com?filter=rising&format=rss' re = /url=(.*?)(?=&url=|$)/g; arr = []; while (m = re.exec(str)) { arr.push(m[1]); } console.log(arr)

If your input is better-formed in reality than shown in the question and you're targeting a modern JavaScript environment, there's URL / URLSearchParams :

 const input = '/api/rss/feeds?url=http://any-feed-url-a.com?filter=hot%26format=rss&url=http://any-feed-url-b.com?filter=rising%26format=rss'; const url = new URL(input, 'http://example.com/'); console.log(url.searchParams.getAll('url'));

Notice how & has to be escaped as %26 for it to make sense.

Without this input in a standard form, it's not clear which rules of URLs are still on the table.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM