简体   繁体   中英

Ruby regex: extract a list of urls from a string

I have a string of images' URLs and I need to convert it into an array.

http://rubular.com/r/E2a5v2hYnJ

How do I do this?

URI.extract(your_string)

That's all you need if you already have it in a string. I can't remember, but you may have to put require 'uri' in there first. Gotta love that standard library!

Here's the link to the docs URI#extract

Scan returns an array

myarray = mystring.scan(/regex/)

See here on regular-expressions.info

Use String#split (see the docs for details).

The best answer will depend very much on exactly what input string you expect.

If your test string is accurate then I would not use a regex, do this instead (as suggested by Marnen Laibow-Koser):

mystring.split('?v=3')

If you really don't have constant fluff between your useful strings then regex might be better. Your regex is greedy. This will get you part way:

mystring.scan(/https?:\/\/[\w.-\/]*?\.(jpe?g|gif|png)/)

Note the '?' after the '*' in the part capturing the server and path pieces of the URL, this makes the regex non-greedy.

The problem with this is that if your server name or path contains any of.jpg, .jpeg, .gif or.png then the result will be wrong in that instance.

Figuring out what is best needs more information about your input string. You might for example find it better to pattern match the fluff between your desired URLs.

Part of the problem is in rubular you are using https instead of http.. this gets you closer to what you want if the other answers don't work for you:

http://rubular.com/r/cIjmjxIfz5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM