简体   繁体   English

Ruby 正则表达式:从字符串中提取 url 列表

[英]Ruby regex: extract a list of urls from a string

I have a string of images' URLs and I need to convert it into an array.我有一串图像的 URL,我需要将其转换为数组。

http://rubular.com/r/E2a5v2hYnJ http://rubular.com/r/E2a5v2hYnJ

How do I do this?我该怎么做呢?

URI.extract(your_string)

That's all you need if you already have it in a string.如果您已经将它放在字符串中,这就是您所需要的。 I can't remember, but you may have to put require 'uri' in there first.我不记得了,但你可能必须先把require 'uri'放在那里。 Gotta love that standard library!一定要喜欢那个标准库!

Here's the link to the docs URI#extract这是文档URI#extract的链接

Scan returns an array Scan返回一个数组

myarray = mystring.scan(/regex/)

See here on regular-expressions.info请参阅此处的正则表达式.info

Use String#split (see the docs for details).使用String#split (有关详细信息,请参阅文档)。

The best answer will depend very much on exactly what input string you expect.最佳答案在很大程度上取决于您所期望的输入字符串。

If your test string is accurate then I would not use a regex, do this instead (as suggested by Marnen Laibow-Koser):如果您的测试字符串是准确的,那么我不会使用正则表达式,而是这样做(如 Marnen Laibow-Koser 所建议):

mystring.split('?v=3')

If you really don't have constant fluff between your useful strings then regex might be better.如果您的有用字符串之间确实没有持续的绒毛,那么正则表达式可能会更好。 Your regex is greedy.你的正则表达式是贪婪的。 This will get you part way:这会让你分道扬镳:

mystring.scan(/https?:\/\/[\w.-\/]*?\.(jpe?g|gif|png)/)

Note the '?'注意“?” after the '*' in the part capturing the server and path pieces of the URL, this makes the regex non-greedy.在捕获 URL 的服务器和路径片段的部分中的“*”之后,这使得正则表达式不贪婪。

The problem with this is that if your server name or path contains any of.jpg, .jpeg, .gif or.png then the result will be wrong in that instance.这样做的问题是,如果您的服务器名称或路径包含任何.jpg、.jpeg、.gif 或.png,那么在该实例中结果将是错误的。

Figuring out what is best needs more information about your input string.弄清楚什么是最好的需要有关您的输入字符串的更多信息。 You might for example find it better to pattern match the fluff between your desired URLs.例如,您可能会发现更好地模式匹配所需 URL 之间的绒毛。

Part of the problem is in rubular you are using https instead of http.. this gets you closer to what you want if the other answers don't work for you:部分问题在于您使用的是 https 而不是 http .. 如果其他答案对您不起作用,这将使您更接近您想要的:

http://rubular.com/r/cIjmjxIfz5 http://rubular.com/r/cIjmjxIfz5

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM