简体   繁体   English

从URL字符串中提取多种模式

[英]Extract multiple patterns from a URL string

I have a URL string: 我有一个URL字符串:

http://localhost:3000/user/event?profile_id=2&profile_type=UserProfile

I want to extract "2" and "UserProfile", where these can change. 我想提取“ 2”和“ UserProfile”,这些可以更改。

I tried to use both match and scan but neither is returning results: 我尝试同时使用matchscan但均未返回结果:

url = "http://localhost:3000/user/event?profile_id=2&profile_type=UserProfile"
m = /http(s)?:\/\/(.)+\/user\/event?profile_id=(\d)&profile_type=(\w)/.match(url)
=> nil 

url.scan /http(s)?:\/\/(.)+\/user\/event?profile_id=(\d)&profile_type=(\w)/
=> [] 

Any idea what I might be doing wrong? 知道我做错了什么吗?

Don't use a pattern to try to do this. 不要使用模式来尝试执行此操作。 URL ordering of the query parameters can change, and isn't position dependent, which would instantly break a pattern. 查询参数的URL顺序可以更改,并且与位置无关,这会立即破坏模式。

Instead, use a tool designed for the purpose, like the built-in URI : 而是使用为此目的而设计的工具,例如内置URI

require 'uri'

uri = URI.parse('http://localhost:3000/user/event?profile_id=2&profile_type=UserProfile')

Hash[URI::decode_www_form(uri.query)].values_at('profile_id', 'profile_type') 
# => ["2", "UserProfile"]

By doing it that way you are guaranteed to always receive the right value in the expected order, making it easy to assign them: 通过这种方式,可以确保始终以期望的顺序接收正确的值,从而轻松分配它们:

profile_id, profile_type = Hash[URI::decode_www_form(uri.query)].values_at('profile_id', 'profile_type')

Here are the intermediate steps so you can see what's happening: 以下是中间步骤,因此您可以看到发生了什么:

uri.query # => "profile_id=2&profile_type=UserProfile"
URI::decode_www_form(uri.query) # => [["profile_id", "2"], ["profile_type", "UserProfile"]]
Hash[URI::decode_www_form(uri.query)] # => {"profile_id"=>"2", "profile_type"=>"UserProfile"}
match = url.match(/https?:\/\/.+?\/user\/event\?profile_id=(\d)&profile_type=(\w+)/)
p match.captures[0] #=> '2'
p match.captures[1] #=> 'UserProfile'

In your expression: 在您的表情中:

/http(s)?:\/\/(.)+\/user\/event?profile_id=(\d)&profile_type=(\w)/

EVERYTHING you put inside () is captured in a regular expression. 您放入()中的所有内容都会以正则表达式捕获。 There's no need to put the s in parentheses because ? 无需将s放在括号中,因为? will act only on the preceding character. 将仅对前一个字符起作用。 Also, there's no need for the (.) because, again, the + will act only on the preceding character. 另外,也不需要(。),因为+只会作用于前一个字符。 Also, (\\w) should be (\\w+) which basically says: One or more characters (and 'UserProfile' is 1 or more characters. 同样,(\\ w)应该是(\\ w +),它基本上表示:一个或多个字符(“ UserProfile”为1个或多个字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM